


Indian Agricultural 
Research Institute, New Delhi 


i.ar 1.6. 

OlPNMC- M.A.R.l. 10-S S5--16.000 




THE ANNALS 

of 

MATHEMATICAL 

STATISTICS 

(pounded bt h. c. carver) 

The Official Journal of the Institute of 
Mathematical Statistics 


VOLUME XI 


1940 



THE ANNALS 

OF MATHEMATICAL STATISTICS 


EDITED BY 

S. S. WILKS, Editor 

A. T. CRAIG J. NEYMAN 

WITH THE COOPERATION OF 

H. C. Carver R. A. Fisher 

H. Cram£r T. C. Fry 

W. E. Deming H. Hotelling 

G. Darmois 

The Annals of Mathematical Statistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore, 
Md. Subscriptions, renewals, orders for back numbers and other business com¬ 
munications should be sent to the Annals of Mathematical Statistics, Mt. 
Royal & Guilford Aves., Baltimore, Md., or to the Secretary of the Insti¬ 
tute of Mathematical Statistics, P. R. Rider, Washington University, St. 
Louis, Mo. 

Manuscripts for publication in the Annals of Mathematical Statistics 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot¬ 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 

Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 

The subscription price for the Annals is $4.00 per year. Single copies $1.25. 
Back numbers are available at the following rates: 

Vols. I-IV $5.00 each. Single numbers $1.50. 

Vols. V to date $4.00 each. Single numbers $1.25. 


R. de Mises 
E. S. Pearson 
H. L. Rietz 
W. A. Shewhart 



CONTENTS OF VOLUME XI 


Articles 

Brown, George W. Reduction of a Certain Class of Statistical Hy¬ 
potheses . 254 

Cochran, W. G. The Analysis of Variance when Experimental Errors 

follow the Poisson or Binomial Laws. 335 

Cox, Gertrude M. Enumeration and Construction of Balanced In- 

t 

complete Block Configurations. 72 

Craig, Cecil C. The Product Seminvariants of the Mean and a Central 

Moment in Samples. 177 

Daly, Joseph F. On the Unbiased Character of Likelihood-Ratio Tests 

for Independence in Normal Systems. 1 

Dantzig, George B. On the Non-Existence of Tests of “Student's” 

Hypothesis having Power Functions Independent of a . 186 

Deming, W. Edwards and Stephan, Frederick F. On a Least Squares 
Adjustment of a Sampled Frequency Table when the Expected Mar¬ 
ginal Totals are Known. 427 

Dodd, Edward L. The Substitutive Mean and Certain Subclasses of 

this General Mean. 163 

Dressel, Paul L. Statistical Seminvariants and their Estimates with 

Particular Emphasis on their Relation to Algebraic Invariants. 33 

Dwyer, P, S. The Cumulative Numbers and their Polynomials. 66 

Friedman, Milton. A Comparison of Alternative Tests of Significance 

for the Problem of m Rankings. 86 

Geiringer, Hilda. A Generalization of the Law of Large Numbers.393 

Hoel, Paul G. The Errors Involved in Evaluating Correlation Deter¬ 
minants . 58 

Hotelling, Harold. The Selection of Variates for Use in Prediction 

with some Comments on the Problem of Nuisance Parameters. 271 

Hotelling, Harold. The Teaching of Statistics. 457 

Hsu, C. T. On Samples from a Normal Bivariate Population. 410 

Johnson, N. L. Parabolic Test for Linkage. 227 

Kendall, M. G. Conditions for Uniqueness in the Problem of Moments.. 402 
Madow, William G. Limiting Distributions of Quadratic and Bilinear 

Forms. 125 

Mood, A. M. The Distribution Theory of Runs. 367 

Pierce, Joseph A. A Study of a Universe of n Finite Populations with 

Application to Moment-Function Adjustments for Grouped Data.311 

iii 






















IV 


CONTENTS OF VOLUME XI 


Reiers0l, Olav. A Method for Recurrent Computation of all of the 
Principal Minors of a Determinant and its Application in Confluence 


Analysis. 193 

Singleton, Robert R. A Method for Minimizing the Sum of Absolute 

Values of Deviations. 301 

Stephan, Frederick F. and Deming, W. Edwards. On a Least Squares 
Adjustment of a Sampled Frequency Table when the Expected Mar¬ 
ginal Totals are Known. 427 

Wald, Abraham. The Fitting of Straight Lines if Both Variables are 

Subject to Error. 284 

Wald, A. and Wolfowitz, J. On a Test Whether Two Samples are from 

the same Population. 147 

Wolfowitz, J. and Wald, A. On a Test Whether Two Samples are from 
the Same Population. 147 


Notes 

Baker, G. A. Comparison of Pearsonian Approximations with Exact 
Sampling Distributions of Means and Variances in Samples from 

Populations Composed of the Sums of Normal Populations. 219 

Brown, A. W. A Note on the Use of a Pearson Type III Function in 

Renewal Theory. 448 

Bleick, W. E. A Least Squares Accumulation Theorem. 225 

Cohen, A. C., Jr. The Numerical Computation of the Product of Con¬ 
jugate Imaginary Gamma Functions. 213 

Cochran, W. G. Note on an Approximate Formula for Significance 

Levels of z . 93 

Deming, W. Edwards. Discussion of Professor Hotelling’s paper on 

the Teaching of Statistics. 470 

Dixon, W. J. A Criterion for Testing the Hypothesis that Two Samples 

are from the same Population. 199 

Dwyer, P. S. Combinatorial Formulas for the rth Standard Moment 
of the Sample Sum, of the Sample Mean, and of the Normal Curve.... 353 
Evans, W. D. Note on the Moments of a Binomially Distributed Variate. 106 
Frankel, A. and Kullback, S. A Simple Sampling Experiment on 

Confidence Intervals. 209 

Johnson, Evan, Jr. Estimates of Parameters by Means of Least Squares. 453 
Kimball, Bradford F. Orthogonal Polynomials Applied to Least 

Square Fitting of Weighted Observations. 348 

Kullback, S. and Frankel, A. A Simple Sampling Experiment on 

Confidence Intervals. 209 

Madow, William G. The Distribution of Quadratic Forms in Non- 

Central Normal Random Variables. 100 

Mauchly, John W. Significance Test for Sphericity of a Normal n-variate 
Distribution. 204 





















CONTENTS OF VOLUME XI 


V 


Norris, Nilan. The Standard Errors of the Geometric Means and 

their Application to Index Numbers. 445 

Olds, E. G. On a Method of Sampling. 355 

Olmstead, P. S. Note on Theoretical and Observed Distributions of 

Repetitive Occurrences. 363 

Wald, Abraham. A Note on the Analysis of Variance with Unequal 

Class Frequencies. 96 

Woodbury, Max A. Rank Correlation when there are Equal Variates_358 

Miscellaneous 

Abstracts of Papers.110,475 

Constitution and By-Laws of the Institute. 116 

Directory of the Institute. 120 

Report of the Philadelphia Meeting of the Institute. 108 

Report of the Hanover Meeting of the Institute. 473 

Report of the War Preparedness Committee of the Institute. 479 

Resolutions of the Institute on the Teaching of Statistics.472 















ON THE UNBUSED CHARACTER OF LIKELIHOOD-RATIO TESTS 
FOR INDEPENDENCE IN NORMAL SYSTEMS 

By Joseph F. Daly 


1. Introduction. In the statistical interpretation of experimental data, the 
basic assumption is, of course, that we are dealing with a sample from a statistical 
population, the elements of which are characterized by the values of a number of 
random variables as 1 , - • - , x k . But in many cases we are in a position to assume 
even more, namely, that the population has an elementary probability law 
fix', • • • ,x k ; ft, • • • , 0*), where the functional form of fix, 0) is definitely 
specified, although the parameters 0i, • • • , 0 a are to be left free for the moment 
to have values corresponding to any point of a set Q in an /(-dimensional space. 

Under this assumption, the problem of obtaining from the data further infor¬ 
mation about the hypothetical distribution law f(x, 0) is considerably simplified. 
For it is then equivalent to that of deciding whether or not the data support the 
hypothesis that the population values of the 0’s correspond to a point in a certain 
subset a of fl. For example, we may have reason to believe that the population 
K has a distribution law of the form 


fix', **; a 1 , a 2 , A n , An , An) = 


IM* 

2t 


-1 2 A i ]{**-<**) (zi -a* ) 


Here the set 12 is composed of all parameter points (a 1 , • * • , A&) for which the 
matrix || An || (i, j = 1, 2) is positive definite and for which — co < a % < oo. 
We may wish to decide, on the basis of N independent observations {x\ , z«) 
drawn from K ) whether An has the value zero for the population in question, 
without concerning ourselves at all about the values of the remaining param¬ 
eters; in other words, we may wish to test the hypothesis H that the parameter 
point corresponding to K lies in that subset of 12 for which An ~ 0. One way to 
test this hypothesis is to select some (measurable) function g(x) whose value can 
be determined from the data, say 


fif(x) = 


E (x'a ~ i')ix\ ~ X 2 ) 

a«*»l _ 

E (x« - * l ) s T E (x« - 4 

„,«■! J _ 


Now g(x) is itself a random variable, so that it has a distribution law of its own 
when its constituent x’s are drawn from any particular population K. Suppose 
then we choose a set of values of g(x), say 8, such that the probability is only .05 
that g(x) will lie in the set S when the x’s are drawn independently from a 
population K for which the above hypothesis H is true. Ordinarily we would 

1 



2 


JOSEPH V. DALY 


take S to be of the form | g(x) | > go , and the test would then reject H at the .06 
probability level if the computed value of g(x) came out too large. But for all 
that has been said so far, we are perfectly free to choose a different critical 
region S , and even a different function g(x). The essential elements of this 
type of test are then a critical region S, a function of the data g, and a probability 
level €, such that the probability is € * .06, say, that g C S when H is true; in 
employing the test we reject H at the given probability level whenever the 
sample value of g falls in the critical region. 

By the very nature of the problem, any inferences we make from a sample are 
subject to possible error. In the kind of test under consideration, the only error 
we can commit, strictly speaking, is that of rejecting H when it is true (an error of 
Type I in the terminology of Neyman and Pearson [9]). The risk of such an 
error is thus known in advance; for if we use the test consistently at, say, the .05 
level, we know that the probability is .05 that we shall be led to reject a given 
hypothesis when it is true. On the other hand, it is quite conceivable that the 
test may be even less likely to reject H when it is false, or more precisely, when 
the true 0’s correspond to a point of SI which is not in o>. In this event the test is 
said to be biased. Let us make this term more definite by proposing the follow¬ 
ing definitions: 

Definition I. A test is said to be completely unbiased if it has the property 
that for any probability level t (0 < t < 1) the probability of rejecting II is greater 
when the d’s correspond to a point of SI — w than when they correspond to a point of w. 

Definition II. A test is said to be locally unbiased if the set SI contains a 
neighborhood U of w such that for any probability level e (0 < e < 1) the probalrility 
of rejecting H is greater when the parameter values correspond to a point of U — co 
than when they correspond to a point of w. 

It is the purpose of this paper to consider the question of bias in connection 
with the Neyman-Pearson method of likelihood ratios [8] as applied to the 
testing of what may well be called hypotheses of independence in multivariate 
normal populations. The likelihood ratio method is undoubtedly a very familiar 
one, Bince the vast majority of tests in present statistical practice are based on 
this method. But for the sake of completeness we shall outline it briefly. Let 
the distribution law of the population K be of the form /(sc 1 , • • • , x k ; 6 \, • • • ,0*) 
where the 0’s may correspond to any point in a set 8, and let the hypothesis H 
to be tested be that the 0’s actually belong to the subset co of SI. Form the 
likelihood function 

N 

PjfiXp &) ^ Hf(x a ) * * • t X a *f 01 f • • • , 0A) 

i.e., the elementary probability law of a sample of N elements drawn inde¬ 
pendently from K* Denote by Pn(x) the maximum of P N for fixed x where the 
0's are allowed to range over SI; and denote by P%(x) the corresponding maximum 
value when the 0's are restricted to «. The test criterion is then 

p\*)’ 


x 



UKELIHOOD-RATIO TESTS FOB INDEPENDENCE 

Evidently A depends only on the observable quantities x«, and'has the range 
0 < A < 1, with a definite probability law depending on that of the basic popula¬ 
tion K. In this method the critical region S is taken to be 0 < A <J A« , where 
A, is so chosen that the probability P{\ < A,} is < when the parameters of K 
correspond to a point in u. (It may be noted here that in all the cases with 
which we shall have to deal the probability that A lies in 8 when H is true is 
independent of the particular values of the 0’s as long as they correspond to a 
point of «.) The reason for taking the critical region to be of the form 0 < A < 
A, and not, say, < A < or X, < A < 1 may become clearer when we examine 
the resulting tests for bias. 

The recent work of Neyman and Pearson [10] has led them to lay considerable 
stress on the importance of unbiased tests. And though their attention has been 
directed mainly to the broader outlines of the theory of testing hypotheses, 
they have stimulated other writers to study particular tests of great practical 
importance. P. C. Tang [11] has obtained the general sampling distribution of 
1 — \ tlK for what we shall call the regression problem with one dependent variate, 
and has given tables for P{A < A,)—essentially proving the unbiased character 
of the test—which should be extremely useful. His article also contains an 
excellent discussion of the manner in which this test is related to the well known 
tests of linear hypotheses [7] and to the ordinary analysis of variance. P. L. 
Hsu [6] has shown that this same distribution is fundamental in the study of 
Hotelling’s generalized T test [5] (a special but important case of what we shall 
call the general regression problem), and has proved that (locally) this test is 
not only unbiased but “most powerful” in a certain sense. On the other hand, 
it is not true that all likelihood ratio tests are unbiased [2]. Consequently, the 
knowledge that in a rather wide class of problems which arise in normal sampling 
theory the method of likelihood ratios furnishes tests which are either locally or 
completely unbiased would seem to be of some value, even when the exact 
sampling distribution of the criterion is too complicated to tabulate. 

2. The regression problem with one dependent variate. Suppose that y is 
known to be normally distributed about a linear function of the fixed variables 
x 1 , • ■ • , x r , so that the family of populations under consideration is characterized 
by a distribution function of the form 


( 2 . 1 ) 


»\-K *»’\ i-i / > 


f(y | x, b, <r 2 ) = (2r<r t )~ i e 
where the set of admissible values of <r s and the b’a is 

Q: 0 < «•* < », — oo < bi < oo. 


Let H be the hypothesis that the point (<r 2 , fei, • • • , b t ) lies in the subset of Q 
defined by 

u! 6*+1 “ bf +1 ■ • • • • i, ■ 0, 



4 


JOSEPH F. DALY 


The likelihood ratio appropriate to testing the hypothesis H on the basis of N 
(IV > r) independent observations drawn from such a population is then 



y / r \» 

2 \Va — £ bai) 

«-l \ i-l / 

«yj 




i 


with the understanding that the values of the fixed variables x \, • • •, x\ asso¬ 
ciated with the a-th observation have been so chosen that the matrix || a** || » 

^ . II 

V. x'ax’a is positive definite. (The expression in the numerator is the mini- 

«*-l 

N / r \2 

mum of £ ( y u — ) for variations of the b’s over Q, while the denomina- 

«-i \ »-i / 

tor contains the corresponding minimum for variations of the b’s over w). 

In order to show that the test is unbiased, we shall make use of the exact 
sampling distribution of the quantity 


f = i - x s/w , 


first published by P. C. Tang [11]. Writing || A ah || for the inverse of the 
matrix 11 a" h \ | composed of the first q rows and columns of 11 a xi 11 , let us put 


G=~ £ (o“ - fc a k A ek a M )b k b,. 
2<r \ »,*-i / 


Since the critical region 0 < X < X, corresponds to the region 1 — \! /v = {, < 
£ < 1, it can then be shown that the probability of rejecting H when the popula¬ 
tion parameters have specified values <x 2 , b\ ■ ■ ■ ,b r is expressed by the series 


( 2 . 2 ) 

where 


I(G, *.) = 


§ y\ L mr -q) + », UN-r)] d ^’ 


B(«, v) 


r(u)T(v) 
r(« + v) 



da. 


Now 0 is a positive definite quadratic form in the parameters b t+1 , ■ ■ • , b r , so 
that it vanishes if and only if the hypothesis is true. And if 0 < < < 1, then 
I(G, (,) is a monotone increasing function of G. For by differentiating (2.2) 
we obtain 


(2.3) 


dG 


I(G, f.) 


-a 


cr r' J - £) i(w_r)_l 

h id 4 \B[i(r - q) + p + 1, ffl - r)] 


B[i(r-g)+v, KlV-r)]}^ 


And from a property of incomplete Beta functions, which we shall demonstrate 
in the next section, it follows that each term in the series (2.3) is positive. Ac¬ 
cordingly we have 




LIKELIHOOD-RATIO TESTS FOE INDEPENDENCE ■' ' IS.' 

Theorem I. The likelihood, ratio teat for the hypotheaie that in a population of 
type (2.1) certain of the regression coefficients are zero, i.e., the hypotheaie that y is 
independent of the fused variables x* +1 , • • • , x r , is completely unbiased. 

Wilks (15] has noted that the ordinary analysis of variance and covariance 
amounts essentially to testing hypotheses of this nature by means of the function 


r 


1-X*" 
\*i" * 


Consequently such tests are also completely unbiased, since the region of rejec¬ 
tion is then taken to be of the form f > . 


3. An inequality relating to incomplete Beta functions. 

B(u, v;t) — f * y-1 (l — zy~ l dz 

Now, 



The integrated term on the right is non-positive, so that 


Let us write 

(0 <>t< 1). 

zy-'dz. 


(3.1) B(u, v + 1; 0 < - B(« + 1, v; t) 

u 

in which the equality holds if and only if t = 0 or t = 1. Again, since 
z\ 1 - «)*-* + ^-‘(l - z) v m 2 U - 1 (1 - z)'~\ 

we have 

(3.2) B(u + 1, v; t ) + B(«, v + 1; t) ™ B(«, v; t). 
Combining these results, we find that 

(3.3) - — - B(u + 1, v, 0 > B(u, v; t ) 

u 

with equality only when t » 0 or t = 1. Hence we have 
Lemma 1: If 0 < t < 1, then 

B(u + 1, t>; t) B(u, t>; t) 

B(u + 1, t>) B(«, v) 


4. The multiple correlation coefficient. Suppose the distribution law of the 
underlying population is known to be of the form 

(4.1) f(x\ | , X”) - 

The indices appearing in this expression take the values t, j *■ 1, • • • , t and 
p, q ■« f + 1, • • • , m. The summation convention of repeated indices will be 



JOSEPH F. DALY 


6 


used, for example, ^p^ will be denoted by C\x* . We shall also have occa- 

sion to use indices r, s with the range r, s = 1 , • • •, to . The set of possible values 
of the a’s, B’a, and C’s is 

Q: || Bn || positive definite; — *> <a { < »; — » <C*, < «. 


We shall consider the X test for the hypothesis H that x 1 is independent of the 
remaining variables x s , • • • , x m , i.e., that the parameters belong to that subset of 
Q defined by 

= (fc - 2, ... ,<); C], - 0. 

AT 

Let us write v" — ]£ (x r a — £ r ) (x* a — £‘), and assume that the values of the 

n*l 

fixed variables x£ have been so selected that the matrix || || is positive defi¬ 

nite. The likelihood ratio can then be expressed in the form 


X = 



(i - rT, 


where tin is the complement of t/ u in the determinant | v n | . If N > m + 1, 
the general sampling distribution of R 2 (the multiple correlation coefficient 
between x 1 and to — 1 other variates), for this case in which x 2 , • ■ ■ , x‘ are sub¬ 
ject to sampling variation and the remainder are fixed, is 


(4.2) 


_ (1 " pY N - l) e-"\l - )«"->" 

Fa?) dm r[i(w-TO)] 

Y f y Kyyq - zA±jl± 

pmmO p—0 - 1) + Mir [*(m - 1) + M + r]~ ' 


where 


1 


\M 

Bn# 1 ’ 


V 


v r 1 r 1 




This distribution was first obtained by Wilks [13], although Fisher [3] had 
previously treated the two extreme cases in which (1) all independent variables 
are subject to sampling fluctuation, and (2) all independent variables are fixed. 

To simplify the presentation, let us put p = p i ,y = and R = R*, and note 
that jj <•* 0 if and only if C, = 0 (p * t + 1, • • •, w) while p — 0 if and only if 
Bu *» 0 (k — 2, • • ■ , <), so that ff = p = 0 means that the hypothesis H is true. 
On any alternative hypothesis, one or the other or both of these quantities will 
be positive. Let the region of rejection be taken to be 


R, < R < 1, 


which corresponds to 


0 £ X < (1 - A) |Jf . 



LIKELIHOOD-RATIO TESTS FOB INDEPENDENCE 

The probability of rejecting H is then 


(4.4) 


*»■- Hi** 


. B[J(m — 1) + m + f, Jt(N — m)]' 


We shall show that I(p, g, R ,) is a strictly monotone increasing function of j5 
for each g, and that 1(0, g, R,) is a strictly monotone increasing function of $. 
dl 

First consider —. We can write (4.4) in the form 

dp 




1 


" p’ 


HP, g, R.) rft (jv -i) + M ]^ v\ 


•L^a -pr- 1)+ v,,, 


where 


_ p[l(AT _ IS _L u _L ,.l ®Ii( TO — 1) + M + P, iW ~ TO )> 

- n \(N i)+p + v \ _ ijqp 7 + -mji * 

Then, formally, 


E -V (i - p)* (w ~ 1,+ '' - £ ? a - p)* (w " ,>+ ' ,_, liW - 1 ) + mW. 

P —0 V ! p -0 v! 


Taking out the factor (1 — p) 

OO r*-l 

vp 

p-o v! p—o v! 


.\}0V-1)+M-1 


£ ~T' ~ E *** ~ E ~t U W — *) + pta* 


, we have left 

SO P 

l 1 

p -0 v\ 


00 p 


“ E {^.r +1 - - 1 ) + H + v]tp,,,). 

p-o 


And the expression wh-i — [£(iV — 1) + p + is the same as 

11 / ^( m ~ 1) Hh m + p + 1, — m), fit] 

\ B[J(m — 1) + m + p + 1> \(N — »»)] 

B[$( m — 1) + m + v, i(N — tn), fi«]\ 
B»(m - 1) + M +p, m - m)] | 


r(KiV-i) + M + p + 


and is therefore positive, by Lemma 1. Consequently 

qI i (p> g> Rt) ^ o, 

with equality holding only if fi — 1, or if the critical region is taken as the whole 
interval or the null set. 



8 


JOSEPH T. DALY 


We have yet to investigate “ I (0, Jf, R ,) . In this case (4.4) becomes 

off 

<«) /<o.«, «.)-«-£ g , 

(Note that this agrees with (2.2) if we make use of the relations r = in, q « 1, 
and B 11 = 2o- s .) We then obtain 

— t/i\ * »V 2/* J B[J(m — 1) + /* + 1, J(N — m); J2«] 

W W V> ,) B[J(n» — 1) + /* + 1» i(N — to)]" 

B[i(m - 1) + m» i(N - m); ft]\ 
B[|(w - 1) + m, HN - m)) 7 

which the lemma shows to be positive when 0 < R, < 1. 

This concludes the proof of 

Theorem II. If the underlying population has a distribution law of the form 
(4.1), then the likelihood ratio test for the hypothesis that x 1 is independent of x, • - ■ , 
x m , where x ,+l , • • • ,x m are fixed and x 2 , ■ ■ ■ , x l are subject to sampling variation, 
is completely unbiased. 

5. Mutual independence of several sets of random variables. 1 Let the dis¬ 
tribution law of the m-variate population be of the form 

/g I Bjj I* 


Here SI is the set || Bn || positive definite; — « < a' < so. Suppose we wish to 
test the hypothesis Hi that the variates {x 1 , • • • , x m> }, • • • , {x mp ~ i+1 , • • • , x m ”} 
are mutually independent in sets [14], where 0 = mo < m x < ■ ■ • < m v — m. 
Then the u set is that defined by 

II B« || - || B ilh || + ••• +11 B iviv || = || Bi || + • • • + || B p ||, 

that is, we have B (i = 0 unless the indices i and j both relate to the same set of 
variates. 

Associated with the population of random samples Oir.(N > m + 1) drawn 
from a universe characterized by (5.1), we have the distribution function 

N 

I R j I** - 2 

P(x; B, a) — e 

The maximum of P with respect to variations of the parameters B ( j , a* in 0 is 


= ,’"'70 


1 In this and in subsequent sections an index occurring both above and below indicates 
summation in accordance with the usual convention. 



tIK*LIHOOD-RATIO TESTS ftfftBPBWBBNCE 0 

where 

v (i m 2 (xi — £*) (xi — £*). 

«*•! 

And the maximum when the parameters are restricted to a is 



where v„ stands for the determinant of the v’s connected with the j^th set of x’b. 
Thus the appropriate likelihood-ratio is given by 


■yS/W 

A/ 




Vi 


Op 


It is easy to see that the value of X, is unaltered if we replace x l — a* by x\ 
so that we can express the probability that X, will lie between 0 and X, in the form 


nVf r 

- 3 *L 


N 

- Z S41*' a * % 

e dx} 


dx”. 


Furthermore, X/ is invariant under the operation of replacing any x by a linear 
combination of x’s belonging to the same set. And since the assumption that 
|| Bij || is positive definite implies that the matrices || || have the same 

property, we can transform the x’s in each set among themselves by orthogonal 
transformations in such a way as to reduce each of the expressions 


B.j.x*** 


to sums of squares. Thus we have 


41 

r *in , - 2 a* .*i*i 

(5.2) I(B, X.) = — Jf ^ « "- 1 dx\ . dx m s - I(B X.), 


where 

(5.3) Bi^j r — &i*Bhpk r otil (hp, i„, jn , kp — wi M _i H- 1, • • •, fit/,), 

(5.4) B*, u = 0 I,*],, 


and the subscripts on the indices indicate the sets of values over which they 
range; e.g., t* runs over the numbers corresponding to the columns of the matrix 
|| Bt ||. From (5.3) and (5.4) it is clear that || flf, || reduces to a diagonal 
matrix when H is true. 

In order to show that the test is locally unbiased, we may consider the deriva¬ 
tives 

/(B ‘ Xl> ).' ’‘•O.’ "* T > 


\>$U. 



10 


JOSEPH" F; DAtT 


for the B*’b are linear functions of the B’a; and the positive definiteness of one 
matrix of second partials implies that of the other. We have at once 


/ bB* \ ( d*B* \ 


(m 9* v, <r s* t) 


unless the second derivative is taken twice with respect to the same B*. Thus 



d *»" e *r 
-2~ m J 1£.x % /x\ 


- 2 


,’e 


dx, 


where the B* indicates that the B’a have the diagonal form associated with H. 
And since whenever the point x{, • • • ; x{, • • • , x* ; • • • , x“ is in the region 
X < X,, so also is the point x\, • • •; —x[, • • • , —x ' N ; • • • , x" it follows that 



I(B*,\.) 


= 0, 


(M 5^ 


Similar considerations show that the non-repeated second derivatives 


d* 


dBiJ'dBx'kr 


I(B*, X,) = 4 




I (t tixA (± 4 -xA 

J\<\, \«-l / \A—1 / 


must vanish. 

Finally, we must show that the repeated second derivatives are positive when 
evaluated at a point in u, except of course in the trivial cases X, = 0, X, = 1, 
when they must be zero. In order to do this, we shall make use of the fact that 
the v’s which go to make up X have the Wishart distribution [17] 

(5.6) -=L-—-„»<*-»>-! dv ll 11 > dv mm 

fj r[ ^ N _ ,•)] 

i-1 


(Because of the relation v <3 = v il , only §wi(wi + 1) of the »’s appear as differen¬ 
tials). It will be useful to have the notation 

DlU-l) 

0(B, N - 1, m) • 

w llm( — 11>> fl iW -»)] 

<-i 

V(B, N - 1, m) - 


With the aid of (5.5) we shall now compute the moments 



LIKELIHOOD-RATIO TESTS FOE INDEPENDENCE 


for the case in which the matrix || £<,-1| has the form 

Bu « • • Blmi 0 • • • OBlm 

0 


( 6 . 6 ) 


Bmil B m 0 ••• 0 

0 ... 0 


• • 
0 

B»iO ... 0 


IIS || 


n 


where || B || stands for || Bt || 4- • • • 4- II B, ||, and all other B’b, except those 
indicated, are zero. Let us designate by (9) the set of v' 3 which correspond to 
the rows and columns of B, and by (v — 8) the remaining p’s. We then remark 
that the result of integrating (5.5) with respect to the p’s in (p — 8) is to reduce 
it to the corresponding distribution for the variables in the set 8, thus: 


(5.7) 


G(B, N - 1 , m) J V(B, N - 1 , m) d(v - 8) 

= G(B, N - 1, m - mi) 7(5, N - 1, m - m x ) f 


where || Bu || is the inverse of the matrix obtained by inverting || £<,• ||, and 
striking out the first rows and columns, that is 


& l = £*', (fc, l * mi + 1» • • • , m )‘ 


Then, 


G(B, N - 1, 



V(B,N - 1, m)d(p — 8) 


can be written as 


(5.8) 


■ aiB ’ N ~ 1 + *■ ■»/<*- # 

X V{B,N- 1 +2h,tn)d(v- 8) 

- - r+^ g(g - *-»+»—-> 

X v? ... p^7(5, N - 1 + 2h, m - mi). 


It can be seen from (5.6) that 

IISII-IIAII+ +11-Bn II+ 11 All 

since of all the rows and columns of || Bn j| which are involved in || 5 || it is 
only the last in which a non zero element appears outside of the blocks || B% (|, 
• • • »II B 9 1|. Consequently, the p’s corresponding to the determinants p* , • • •, 



12 


JOSEPH F. DALY 


v p are independently distributed, so that if in (5.8) we integrate out all the 
remaining p's but these, we shall be left with a product of factors 

G(B,N-l,m) ffQ(B t ,N -l+2h,k t ) 

G(B, N-l + 2h,m)'bh G(B,, N - 1, k,) 

X G(B„ N- 1, k t )v7 h V(B t) N - 1 + 2h, k t ) 


X 


G(B P , N - 1 + 2h, kp) 
G(Bp,N-l,kp) 


. G(B P , N 


1, k p )v-; h V(B P) N - 1 + 2A, k t ), 


where k p stands for the order of 11 B„ 11. And this, when integrated with respect 
to the p's in Ps, • • • , v p , yields 

G(B, N - l, m) ff G(B t , N - 1 + 2h, k t ) G(B VI N - 1 + 2h, k p ) 
G(B, N - 1 + 2 h, to) • fi ‘ G{B' t ,N - 1, k t ) * " G(B P , N - 1,\) ’ 


which, because of the definition of the G’s, reduces to 


tt TliCAT — i) + h] t’t yj — t)] 

it r[i(AT - z)] ’ it ii T[$(N - i ) + h] 


X B~ k B'i ... BUB"*- 


Denoting the product of ratios of I”s by Kh , and recalling the form of || B„ ||, 
we therefore have 


(5.9) 

with 


r p* ■ 

- K h B h p B'~ h 

L»} • • • Vp m 

Bn • * • 

Bin, 0 • • • 0Bl„ 


0 

Bm\l • # * 

Bm, m , 0 • • • 0 

0 

• 

0 

• 

• 

: II*, II 

0 

B m i0 • • • 

0 


But it is not difficult to see that under the condition (5.6), the matrix || B p || 
is also the inverse of the matrix obtained by striking out the first to rows and 
columns in the inverse of || B' ||. Making use of this relation, we can apply the 
Jacobi theorem to (5.9), and put that expression in the form 


E 



K k B r*, 


where || || is the matrix in the upper left hand comer of || B' ||, namely 



LIKELIHOOD-RATIO TESTS 70B INDXPKNDENCX IS 

Let the subscript 0 on a B stand for the result of replacing B<,y, by £<,/, + 
Pi^t . For sufficiently small values of the 0’b the matrix || Bt # | j will still be 
positive definite, so that we shall have 

- ^r— -/ T^f** hl ** tw< '* « KkB$, 

t k«(«-d) nr(KAr-t)] P 

>'-i 

which we can put in the form 

(5.10) K' f - r -- * dv - „ -i r - Sy-iv 

J v,.--v p bVbP^ 

Wilks [13] has shown how to generate moments of determinants by the device 
of replacing 0i tii by d<„-, + , and integrating with respect to the {’s from 

— <» to oo. Applying this process 2 h times to the left hand side of (5.10) gives 

* hk K' J (~~- Y F(B„ N - l, m) dv, 

J \t»i • • • v p / 

which when multiplied by r~ ilh B i<,ir ~ 1) yields 

E[(\ s,N ) h ] 


when the 0’s are set equal to zero. 

To obtain the value of this expression, we may perform the same operations 
on the right hand side of (5.10). But before so doing, we shall put Bf in a 
more convenient form. We have 

_ D J5 D* 

where B mm is the inverse element of B mm in || E ||, and Big 1 is the cofactor of 
Bug in B u », the result being obtained by expanding Bg according to minors of 
the first row and first column. Similarly, 


(5.11) 

From (5.11) we have 


B«B 1 .B-B? B 35 m " , .B( 


/a 


B 

B\ ••• Bp 


1 - BLIT". 


/u 


Bi ’ 


so that if we put B BT 1 • • • B^ 1 = A, we find that 


Bg * 



BLBT. 


m 


Bi_ 

Bx 


Bi B$\ 

W*iJ 






14 


JOSEPH F. DALT 


Thus the result of multiplying (5.10) through by B i( * l) (where no 0’a are 
substituted in this determinant) can be put in the form 


(5.12) 




B$. 


Expanding the expression in curled brackets, we get 




oo 

z 

r-0 


r[K-^ 1) + p] nr/BffV n-lUN-D+h+ri 

rirtKW - i)l b ‘[bTV “ 


(1 - A)’. 


If we let Bifi stand for the result of replacing Bn by Bn — < in By, we can write 
this as 


(5.13) 


A Mat-1) V — 1) + v] /, i \r/nK«-j)+» 

A h ( “ } ( l) 1 

r[J(A( — 1 ) + h] d ' n-lKw-U+M 

A T[$(N - 1) + h + v] dt’ m 


the derivatives being evaluated at t = 0. 

Now Wilks’ results show that the operation of introducing ft,,-, + £*i£n > n t° 
Bifit to replace and integrating with respect to the £’s, when repeated 2 h 
times on Br$ ( * ,-l>+w , produces 


7T 


D—i(iV~1) 
D\i 


fv nm - m 

fir [M-Q+'h] 


when the 0’s are finally set equal to zero. Reversing the order of summation, 
differentiation and integration in (5.13), we thus obtain 




r[|(JV - i)] aKk ^ f* vm ~ 1 ) + r] 


(5.14) 


Now 


S m(N - i ) + h) “ f=i FlTiKN - 1)] 

N/ (\ kY(B ni \~ p DtW-lR* nm -1 ) + h] fd" b-uv-xA 

x (1 - A) (Bx ) b, fffiiT-iJ+T+T] W Bm a 



r(i(N - i) + p] 
r[l(N -1)] 




so that (5.14) becomes 

m(N-t)i v roov - 1 )] 

fi r[i(N -♦) + *] £iv\ r[KN -1)] 

V./, n. rm-D + h] r[i(N-D + r] 
m(N - 1) + h + rj* r[i(iv - 1)1 • 



LIKELIHOOD-RATIO tests- #OB INDEPENDENCE 


From this it appears that the A-th moment of X? /w is given by 


(5.15) 


_ ft raw - *•) + h) ft ft raw -1)] 

E[{\ ) ] - ll mN _ ^ 1111 raw - i) + h] 

v x' (] \y raw — l) + p] 
X4 h {1 A) piraw-i)] 


^ raw -1) + p] raw -1) + A] 
* raw - 1 ) + A + p]* raw - i)i ; 

A considerable amount of cancellation will take place in (5.15), for m is greater 
than any k t . Suppose the largest k t is k,< . Then we can cancel its product 
into the first one, with the assurance that there will be at least one factor 

(a im r»W ~ 1)1 

(5,16) raw - l) + A] 

to cancel the corresponding factor under the summation sign. Hence we have 

_ TT raw - 1 ) + A] ft, ft,, raw - »■)] 

. * [(x } 1 ■ JL ~mN- oT'ii M raw -V) + Ai 


(5.17) 


f[(x j/ yi= n 

»— A ; t /+1 


raw 


v A »(w-j> _ AV raW — l) + p] raw — l) + p] 

* £r ; piraw-i)] 'raw -1) + a + *r 

where II' indicates that t' has been omitted, and II" indicates that one factor 
(5.16) has been cancelled. Then we can take out the factor * = m in the first 
product, putting it under the summation sign, where, together with the final 
factor in each term of the sum, it gives rise to the combination 

raw - 1) + p] raw -m)+ A]ra(m - 1) + p] 
raW-mTMKm-i) +7f raW-i) + A + p] 

After making this reduction, we obtain 


(5.18) 


r,rA*/^M _ TT raw - i) + A] ff, raw - *)] 
m ) 1 ” .-ir+i" raw - i)] ii M r(fw - *) + A] 


v A i(p-n V' a _ A \* r[lW — 1) + p] B[§W — m) + A, i(m — 1) + p] 

* ' Piraw-l)] Baw-m),i(m-l) + p] * 

The products of ratios in the first part of (5.18) are of the type discussed by 
Wilks in connection with integral equations of type B [12]. It follows from his 
results that \) IN is distributed like the product 

e ■ 0i - • • 0 m > (m! =» w — k t > — 1), 

where e and the 0’a are independently distributed, with the distribution of the 
0’s given by 

fi rJ(S- - K) -‘ ! " (1 - 





16 


JOSEPH F. DALY 


where the b« and c» are constants which depend on N, m, and the sizes of the 
blocks, but not on A, and the distribution of z is given by 


F(z) - A Hn ~ 1} £ 


„ 4V rim - 1) + r] s w, - , - , (i - z) Hm ~ 1)+ ^ 

K ’ v\T\m - 1)] *Bft(ff - m), Km - 1) + A ' 


Consequently, the probability that X lies between zero and X, is 


J(A, X,) - A* 0 '- 1 ' / £ 

J S p—o 


(1 - A) v 


rfKAT —!)+*'] 

v\Tim- 1)] 


x/( 0 ) 


B[KiV - «), f(m - 1) + y] a ’ 


where the integral is to be extended over the region 

S: 0 < z-6i ■ ■ ■ e n> < Xj /JV , 0 < 0< < 1, 0 < z < 1. 


Let us integrate first with respect to z and then with respect to the 0’s; we have 


(5.19) 


J( A, X.) - / A* w_1> 


E (1 - A)' 


v-0 


rlKJV - l) + v) 
vinKAT -T)] 


x 


B'lm - m), Km - 1) + <p\ t < a \ 

B[K^-m),Km'-l) + p] m<W ' 


where St is the set 110, < Xj /Ar , 0 < 0< < 1, and 


(5.20) 


B'(u,v >v >) = fz-^a -z) v ~ l dz 
J 0 


= f z v *(1 ~ *)“ 1 dz — B(t>, u, 1 — <p), 

"1—<p 


<f>(0) being the upper limit for z for fixed 0. It is clear that the subset of s t for 
which <p(0) < 1 will not be of measure zero in the 0-space, since we assume that 
0 < X, < 1. 

The relation between (5.19) and the corresponding expression for the multiple 
correlation coefficient without fixed variates—the case § =* 0 in (4.4)—may be 
clearer if we put 

(5.21) p - 1 - A - 

where B nm is the inverse of B nm in || B || , and B” is the inverse of B u in || Bi || . 
Then the required probability of rejection when p has any fixed value is 


Kp, i - xD 



i< »-t> rim — i) + y] 

rim-i)] 

B[Km — 1) + v, KAT — in), 
B[Km -l) + y, HN - 


1 - 
m] 


-/(*) dd, 


where we have used the relation (5.20) between the incomplete Beta functions. 
Differentiating with respect to p before performing the integration with respect 



UKEtlHOOD-IUTlO TBSTS SOft IKDEPBKDENCE •HJ'- 

to the 0 ’b> we find by a computation similar to that in section 4 that each term 
in the series is positive except where <p(d) =* 1; so that we have 

>0 <X.*1,Q). 


And by (5.21), we then have 


a 8 / 

afilm 


> o. 


Since the argument is clearly independent of which B^j, (m y* v) we take, it 
follows that the test is locally unbiased. We have therefore proved: 

Theorem III. If x l , • • • , x m have the joint normal distribution (5.1), then the 
likelihood ratio test for the hypothesis that the x’s are independent in sets is locally 
unbiased. 

In certain types of statistical material it may be important to consider, not 
the independence of the x’s themselves, but of their deviations from regression 
functions. For example, in the case of several related time series, it may be 
desirable to eliminate the trend of each x* by means of, say, a second degree 
polynomial in t. Consider then in general a population whose distribution func¬ 
tion is of the form 


c <*»)(*#—cf*') 


( m , V - m + 1, •.., m + q) 


with unknown Bn and C l „ . The likelihood ratio for testing the hypothesis H t 
that the sets of deviations 


Cl A •. •, x 1 "' - C;V; • • •; x - C?~ ,+i A 


- CTxT 


are independent is 


where 



<T = 2(xl - ClA)(xL - Cix’ a ) 


and Cl is the usual least squares estimate of Cl , given by 

ClaT m a” 

with 

a r * = 2x*Xa (r, s 1, • • • , m + q). 

An examination of the characteristic function of the d {i shows that their 
distribution law is the same as that of the v' 1 of the preceding discussion, except 
for the fact that N — 1 is replaced by IV — q. Consequently the above results 
on freedom from bias, and also those of the next section, apply equally well to 
the X/ test for the independence of deviations from regression functions. 



18 


JOSEPH 1 F-'DAET “ ‘ 


6. On the moments of x} /w . Although we have succeeded in proving the un¬ 
biased nature of the preceding test only in the local sense, we can show that the 
moments of the criterion \] IM have a property which seems very closely related to 
that of furnishing a completely unbiased test. For it can be shown that each 
of the quantities 

E[(\ t,N ) k ] h = J,l,%... 

is greater, when Hi is true than when any alternative H' holds. It will perhaps 
be sufficient to prove this statement in detail for the case where h = 1 and 
where H/ia the hypothesis that the matrix || !?<,• || has the form || J5o || 4- ||B«,/,|| : 

BiiBu 

0 0 

Bn Bn 

Bn Bn 

0 0 

BuBn 

0 0 II flu III 

in the notation of the preceding section we then have 

*x,jx = 1,2,; it ,jt = 3,4; it ,jt « 5, •. • , m. 

Even when H is not true we find that 


(e.r) iVlV'T]- 


G(fl, N - 1, m) 0(B, N - 1 + 2ft, m - 4) 
0(B, N — 1 + 2h,m) g(B, N — 1, m — 4) ’ 


where B iih = B l,n . Using the definition of the G’s in section 5 and the Jacobi 
theorem, we can write (6.1) in the form 

E[\ v u T | v ith |- A ] = K k B~ h 

where B is the determinant of the matrix composed of the first four rows and 
columns of || B^ ||. In the general case we therefore have 


Bn Bn Bn Bu 
Bn Bn Bn Bu 

Pll 

Bn Bn B» Bu 
Bn Bn Bn Bu 

Thus if we set h — 1, and replace B ilh and B iiit by B h „ + $$ -f- 
and + tutu + respectively, indicating this replacement by a 

prime, we obtain 

fl((X ,/w ) 1 l - Ki f 


(«. 2 ) 





LIKELIHOOD-RATIO TESTS FOE INDRPUNDENCH1 19 

Treating B' as a bordered determinant, we can reduce it to - 

B‘- B<m(l + 

- B (11) (l + + B&h^^i’) 

- B»< 1 + B&«»)(1 + *W&tff)(l + BilUM) 

= B(1 + B^ftffXl + BM'tftfXl + BU&M’Xl + 

where the subscripts on the B’s indicate the sets of £’s still contained in the 
determinants, and || B ,y || = || B<,||~\ Similarly, 

(6.4) B' 

- 5<i + + «MW>a + «©(i + m&&), 

the inverse now being taken with respect to || B ||. 

But between, say, B'd*’ and B\lt ], there is the relation 

(6.5) Bi# - B&& - B[$B lWith B {$, 

where || B(i*x,„ || = || B('j* || -1 , that is, the inverse of the matrix obtained by 
deleting the first four rows and columns of || B}[t) ||. Consequently 

< Bwwtf 

with equality holding only for those values of the £’s for which 

«ffB$? = 0 ti = 5, • • • , m. 

And this set of £’s will not make up the entire £ space unless || Bn || = 
|| 5 || -f- || Bi,j, ||. Applying the same kind of reasoning to the other quad¬ 
ratic forms in (6.4), we can therefore show that 

J £ 1(W-I) B'' 1 d£ 

< B~ l j (l + B ilil $$)~* {l,+l) ... (l + 5$l)lM)^ +,, dS. 

The last form can be reduced to a sum of squares with unit coefficients by a 
linear transformation of the £ W) ’s; thus 

( 6 . 6 ) 

sir'Ji anil, |-»a+•••<!+ «. 

And by making ruse of the fact that 

• B\m) = Ban) • | B(u> <,/, |, 



20 


JOSEPH F. DALY 


we can express the right-hand side of (6.6) as 

s- / aw i iw, |-*o+**■'■• - a+«!X’r* l ' m ’ «• 

This in turn becomes [c. f. (6.4)] 

«-*/1 a™..,. i-*(i+s-' 1 1<” + sai’ 

X a++iiffeirr"'" 1 « 

= /1 «<»>.,(, ra+s“'‘ i™ +Fa! 1 «“ 

x a + + ssil'iHT 1 """ «. 

At this stage we can write 

I | = | B ilh |(1 + B* Wl ij"t}")0 + BVrtftf), 

where || B**}’ 1 || = || Bun,,-, || _1 , and apply the relation 

BVr - £ar - , ii £«><„•, ii - ii w ir. 

Therefore, 

ai&'-tM <Fa!'{ff{ff, 

unless = 0 (** = 3, 4). We can thus continue as follows 

J d£ 

< 1 B hh r 1 / (1 + + BVt il $tfx)~* N+1) 

x d + + 2$ , $ > r‘ (w+i) 

Transforming the f^'s, we get 

| T / |B?i)‘' 1 l~*(l + B* <, ' , { < ( } > {J1 ) )"* <W+1) (1 + 2^ ) ^ > )- 1(w+l) 

X (1 + Z$ > *j; , )“* , (l d*. 

Since | B*i}* 1 | -1 = | Ba) ilh |, this becomes 

I B ilh I" 1 / (1 + + S{i®{i! ) ) _l(w+,) 

X (1 + 20<I ) )~**(1 + 2tiUh)~ M+1) # 

= j (1 +2^ , |il , )~ lw (l+2^ ) i < ( J ) )“‘ ( ' r+1> 

X (1 + 2fc ( ? Cffr^Cl + 2{i; ) {“>)-*<* +1 > df. 



LIKELIHOOD-RATIO TESTS 70S INDEPENDENCE 


21 


!H-l> 


Collecting these results, we finally obtain 
Ki f & w ~ l) dl 

l8 ' 7> «/(i + leSW'd 

x d++2 ^j ) { < , j ) )- | w+,> # 

with equality only in case H t is true. But the right side of (6.7) is the first- 
moment of X* /Ar computed under the hypothesis Hj, while the left side gives 
the corresponding moment in the general case. 

The possibility of carrying out this reduction for the case in which the matrix 
|| 5 || has more than two blocks, or blocks of unequal size, seems sufficiently 
clear. And to obtain higher moments, we have only to introduce the proper 
number of f’s into each set. We then have: 

Theorem Ilia. Let X< be the likelihood ratio appropriate to testing the hypothesis 
H t that the normally distributed variates x\ ■ • ■ ,x m fall into the mutually inde¬ 
pendent sets x\ • • • , x mi ; • • • ; x mp ~ l+l , ■ ■ ■ , x m . Then the expected value of 
(X?'*)\ A — 1, 1^, - , is greater under the null hypothesis Hi than under any 

alternative hypothesis in 12. 


7. The general regression problem. Let the variates x\ 
tributed according to the law 

I Bij cj**—r) 

~ e 


, x * be dis- 


(7.1) 


Throughout this section, let the ranges of the indices be 

i>3 — !»•••»* P, Q = t + 1, ••• ,m 

r, s == 1 , • • • , m r', s' = 1 ,' • • •, t + g 

p, «' = < + !, •••,< + ? <r, r = t + ? + 1, • • • 


,m. 


In (7.1) we therefore have t random variates, and m — t fixed variates. Con¬ 
sider the hypothesis H that the x i are independent of the last set of x’s, namely 
x*. We have 

Q: || Bn || positive definite, — » < C\ < », 
while for u we impose the additional requirement 

d - 0. 

Thus in general we have for the distribution of random samples On , N > m, 



22 


JOSEPH F. DALY 


while when H is true, we have 


(7.3) 


P = 




ri«‘ 


Differentiating (7.2) with respect to the B’s and C ’s and setting the derivatives 
equal to zero gives us the conditions 

(7.4) Z Cpxlxl = Z 


0-1 


(7.5) 


S'* = 4 L (x‘« - CUS)(xl - C’xl). 

Jy a-1 


As in section 2, we put 



x a x a 


and assume that the fixed values x v „ have been so chosen that || a M || is positive 
definite. Then (7.4) and (7.5) can be combined to give 


where 11 a» 0 11 1 


Similarly, 


where 


P^ = 4(a*'-a‘V Pff a«) =4^ 

= || a vt ||. It then follows that 

p. - nv (£)">. 

$ = a" - || a', Ip 1 - || o'*' ||. 


The matrix || a” || will be positive definite except for a set of probability zero, 
so that we can consider 11 o° 11 as the inverse of the matrix obtained by removing 
the last m — t rows and columns of the inverse of || a r * ||, and || || as the 

inverse of the matrix obtained by removing the last q rows and columns of 
|| o’ - '*' || _1 . Then by the Jacobi theorem 


i5"r‘ = 



14'r 



so that the appropriate likelihood ratio is given by 



LIKELIHOOD-RATIO TESTS FOR INDEPENDENCE 


23 


It will be advantageous to complete the matrix || £<,•1| in (7.1) by defining 

™ B : = 

B m = CpBijCi. 

(Evidently B ip = 0 for i = 1, • • • , l and fixed p, if and only if C, = 0, 
j = 1 , • • • , i). We can now write (7.2) as . 

(7.7) nr, B) - i-M! e~‘-' 

T*"‘ 

We next notice that X is invariant under the transformations 


so that if we put 


x* —> ot'x’, x —► /3'x T , 


I(B, X«) = [ P(x, B) dxl • • • dxs , 


where the integral is extended over the region 


it turns out that 


S : 0 < X < X«, 


I(B, X.) - /(£*, X.), 


provided 


B*j — ociBkia), B* h — alBicp, B*, = alB/crPl- 


To prove the locally unbiased character of the test, we may therefore consider 
the derivatives 






and assume that || £* || and || oT || are in diagonal form. We also observe 
that X is unaltered by the transformation 


x’ x‘ + 


We therefore have 


I(B*, X.) 


| jg* | tAr r 


Thus, 


hb:,u -[± 


* ” .-i 


x a x a e 



24 


JOSEPH T. DALY 


which is easily seen to be zero. Again, consider a non-repeated second partial 
derivative, say 


/(Bo*, X.) 

dBtdBt 


= -2 


I Bu 


* ii* 




a 


B* kl a" 


- 2 2 a-o 


r*-£ -tJ.tA 
1 / 




This plainly vanishes if k ^ Z; but it is by no means easy to see what happens 
when k = l, even when a ^ r. Let us therefore study the distribution law of 
\ tlN for the case, 

B^ = 0, t ^ 1. 


(We shall not, however, assume that the transformation B —> B* has been 
made on the B’s .) 

Define 

Bpq ~ Bpq — BpiB' B jq , 


an vt 

a a^a y 


where ||a MI ,|| now .stands for the inverse of ||a Mr ||. Those* expressions will 
arise when we adapt Wilks’ method of moment generating operators [13], based 
on the identity 


(7.8) / e~ Br ‘ a, ’dx\ • • • dxh = « r' Nt B~ iN exp (- BpqcT) 


to the problem. We shall understand from now on that B = | S,, | and 
|| B' 1 1| = || Bn Ip 1 . Let us rearrange the form in the exponential on the 
right, thus: 

5 P4 o M = {Bp,<r + 2B ha a“’ + - 2 \B 0i B i, B i J" 

- B. i B u B, r a^a ll ,a ,T ) - B ti B iS B ir a" 

« Q - B'iB^BjrcT 
= Q - B%i . 

A subscript j8 will denote the result of replacing fi,v by B r + /3 r -,>, and a 
prime will indicate that each 0,v has been replaced by /3 r v + Consider 

now the result of integrating the right hand side of (7.8) after these replace¬ 
ments have been made: 

JBf^ s exp ( — 5p*/jO w ) dfi • • • d%td£t+i • • • d|<+« 

/ B'f*”&*'«'({ e'Vdl^dti, 


(7.9) 



LIKBLIHOOD-KATIO tests fob independence 2S 

Let us integrate first with respect to the . Wilks has shown how to write 
Qf in the form 

Ql - -Q'x, + + |? , 

where 

Q'xe - BrtBpBntr + 2 .B^B?B^cT + B^B^a^aT. 

This latter expression is thus free of the . Consequently, 

Je- Q idi„ - (0 i# 

where 

Ow = ^aM^^WB^B'y^i), 


which can be written 

{ Bm Bp’ B rk 0 hit a*’ + 2 B^B? B*B? 

Jig 

+ B,<B' li ii B ri B'ff* t l; i i l a"‘a ll ,a"}. 

The method of reduction used by Wilks can now be applied to Qw and Q«p, 
and gives 

Q'u, + Qlff - B^ByB^a*’ + 2B^Bj'By.a'*' + B^B^a^a”, 

an expression which does not involve the {’s. Thus 
(7.10) J = ir h | a' 11 ' r‘BjV 0< .fl^. 

Now the quantity 

«**"»« = t b;-', 

r-0 JM 

where 5° stands for the cofactor of B<, in || B„ ||, can be expressed in terms 
of Bff , provided we use our assumption that B,-» = 0 , i ^ 1, whereupon yuBf* 
reduces to the single term yBp n . In fact, we have 

E\g ,| a" |*I - £ II *(1V - m + t + 1 - i, 2 h) \ a” \\' m B? h ” +k) 

(7 ' u) “ X - r„*v 

x «p »Jt n 4"* u ‘ i»" rv«'** 5 



26 


JOSEPH 7. DALY 


where, following the notation used by Wilks [13], 

gp _ R = B 1 * exp (-5 pe o M ), 


^ r[Ko + W] 
* (o ' W ‘ -flM— 


And (7.11) can be written as 


(7.12) 


Elg„ |o" | A ] = i?7r iA " n^.|a w |‘B* V 03 

i~l 


v V r[|(JV — g) + A] 9” / ' R -ti(Ar- 4 )+*^ 

X h V ! r”[i(JV - q) '+ h + y] du’ { “ u )u ^ ' 


where B« stands for the result of replacing Bn by Bn — w. Changing into 
/W + ir'f,' and integrating, we then find that by virtue of (7.10) 


(7.13) 


Now 


E[g p | a" T | a''' f 1 ] = RJ Nt H *■ I o w |* | a!" p^V^ 3 

»-1 


v - 1 * fr y’ mN-q)+h] 9' 
tJo p ! m(N - q) + h +7] du' 




[ BgZ lHN ~ ,)+hl dii = b^ 1<w - , -« ,+M t I ‘ II^(N - q + 2h + 1 - i, -1), 

J t-1 

so that (7.13) becomes 

E[g, | a" I" |o r '*' I -4 ] = Att 1 *' II ^(iV — m + t + l — i, 2h) 

*-l 

(7.14) X n 1 ^(AT - g + 2h + 1 - i, 

<-1 

v V / JW-?) + h] 9' 

xfifl 6 h 7\ rff(N“ g)“+T + r] S? (B '“ )u -°- 

Comparing (7.14) with (7.12), and making use of the fact that 

*(a, -1)*(1 - 1, -1) ... *(o - 2h + 1, -1) « *(a, -2A), 

we thus have 

Elg, | o” |* | a''*' I"*] = &r lw< II *(N - m + t + 1 - i, 2h) 

i-1 


xiltiN - q + 2h + l - i, —2h)\a M [ A | a'*' \~ K B? q e'** 



LIKELIHOOD-RATIO TESTS TOR INDEPENDENCE 


27 


Setting the 0’s equal to zero, performing the differentiation, and recalling the 
definitions of R and Qi , we then find 

E[(\ 2, ") h ] m n — m +1 + 1 - i, 2 h) II +(N - q + 2 h + 1 - t, - 2 h) 


(7.15) 


-M“ V (v&'y r[*(JV -q)+h) T li(N-q)+y] 


tA y\ mN-q) + h + v] T[l(N - q)] ' 

Taking the first factor from each product, we can convert (7.15) into 

II *(JV — to + t + 1 — t, 2h) II *(AT - q + 2ft + 1 - t, -2 h) 

M 1-8 

,iu v' (yB 11 )’ r[i(iV — »» + /)+ A] r[i(N — g) + p] 

& v! r[i(JV — «"+"or "nh(N- q) + h + ;y 

This last product of ratios of T’s is equivalent to 

r[*(AT -q) + v] r»(m -t-q) + p]T[W - m + t) + h] 

fik(N~- m + <))r[*(m -<-g) + v]' Tti(N - q)+~'h~+ *] 

Thus the moments of X !/w are connected with an integral equation of type B 
[12] and \ 1IN is distributed like the product 

z-8t • • ■ 0 t 0 < z < 1, 0 < Oi < 1, 

where the joint distribution of the 0’s is 

m 

_ TT r[|(JV — q + 1 — »)) ol(N-m+<+l- 0 -l/. _ a 

i4 r[J(AT -m + t+l- i)]r[i(*n - t- g)]* v ° 

and e is distributed independently of the 6 ’s with the distribution 

(7 w fh _ -,.u ^ (yB n Y 

(7.16) F(z) -e £ v] fi[i(Ar _ m + t)i + p]’ 

The probability that 0 < X < X, is therefore 


I(y, X.) = Jj(0)F(z) dxdOt • • • do,, 


where S is the region 0 < - • • 0<, z < \\ IN . Putting <p(0) for the upper limit 

of z in S for fixed 0, and S$ for the projection of S into the 0 space, we then have 

r f * /*. f* i(JV—m+ 0 “l /-j > 

f J ~VB" x* {yB ) [ « vl - *) jAn 




V I » Tl Jo B[%(N -m + t),i(m-t-q) + v] 
If we replace z by (1 — z) we then find 


(7.17) 


I(y, Xo) « [ m 
Ja, 

„ L-yan + (yB u rmm -t-q) + », h ( N-m + t ); 1 - *j\ a. 
*V &"7T B[i(m-*-g) + »<,i(tf->» + *)] / 




28 


JOSEPH F. DALY 


As far as y is concerned, (7.17) is essentially the same as (2.8). The computa¬ 
tion which was made there, together with the type of reasoning employed in 
the latter part of section 5 in connection with the independence test for several 
blocks, then shows that 

0 (0 < e < 1). 

oy 

Remembering that 

y = cTBriBn , 

we see that 

(JUL) =0 -gUlL- = 2a tr 

\dBjo ’ dB tl dB rl ’ 

and we remark that the assumed positive definiteness of 11 a”® 11 implies that of 
II a" II. Hence the relation 

{dB»wir i(y ’ x,) )o = y. r 

together with the fact that we could have obtained the analogue of (7.17) 
under the assumption 

B^ = 0 i io, 

where io is any fixed number in the set 1, • • • , t, shows that the matrix of 
second partial derivatives is positive definite when H is true. 

Thus we have 

Theorem IV. Let x\ • • • , x* be normally distributed about means which are 
linear functions of certain fixed variates x t+1 , • • ■ , x m . Then the likelihood ratio 
test for the hypothesis that the distribution of x\ , x 1 depends only on a selected 
subset x t+1 , • • • , x t+v of the fixed variates is locally unbiased. 

The result of this section has its most immediate application to those problems 
in the analysis of variance which require simultaneous consideration of several 
interrelated dependent variables x\ ■ • • ,x‘ in conjunction with a given set of 
independent variables x t+1 , • • • , x m [15]. For the usual hypothesis to be tested 
in this case is that x 1 , • • * , x* are jointly independent of, say, x l+t+1 , 

To return to the general case of (7.1), the method of this section can also be 
used to test the hypothesis that the regression coefficients referring to the x" 
have particular values, say 

Ci - CU i = 1, • • • , t\ a = t + q + 1, • • • , m, 

the remaining C’b and the B's being left unspecified. Since we have 

x< -cy- cw = x*' - cy - (ci - ct 0 )x' - cyr, 



LIKELIHOOD-RATIO TESTS TOR INDEPENDENCE 


>y the device of replacing x* a by — Cio*« , we can reduce this problem to 
;hat of testing the hypothesis that 

C, 1 = ci - cU = 0. 


Similarly, the problem of testing whether the linear functions — aJCj have 
ipecified values uU comes under the same heading [7]. 

A particularly interesting case of the general regression problem is that in 
vhich m, = t + q + 1, so that the null hypothesis H states that the chance 
variables x* are independent of the fixed variate x m , though they may depend 
lpon x >+l , • • • , i** 1 . In this case we are able to find the exact distribution 
aw of A s/W without assuming that any of the regression coefficients C* are zero. 
For the quantity 


;7.18) 


Y' {ynB xl Y ^-u(w-«)+»+w 
£» v\ * ’ 


vhich would have occurred in (7.11) had it not been for the restriction 5* = 0 
[i 9 * 1), can now be expressed in terms of even without this restriction. 
By definition 

y ti &* = 6 mm B u B mi B mi 

md the vanishing of the B mi is equivalent to the vanishing of the regression 
coefficients C'„ associated with x m . And since 

| Bn - ua nn B mi B* |=5- ua nm B iJ B mi B mi , 


we can write (7.18) in the form 


V 1 T[W - q)+h} 
£&v\r[$(N- q) + h + V Ydu> 




where 


Bff U || = || B iifi - ua mm B mi B„ 


is positive definite provided u is sufficiently small. Thus the moments of 
jan be found from (7.15) if we put a mm B'’B m iB m j = yaB li in place of yB n . 
Moreover, it can be seen that when the value m — t-\-q-{-l is substituted 
into (7.15), that expression reduces to 


E[(X ilN ) h ] - e~ ViiBii Z 


(y<iB ij y B[js(N - m + 1) + M(m - q - 1) + v] 


v \ B[*(tf-m + l), *(»»-«- 1) +v] 
30 that is distributed like w, where 

(7.19) f(w) = Z W ' 




rl B[J(JV — m + 1), J(m — ? — 1) + »] 



30 


JOSEPH T. DALY 


The distribution law of X 2/-v for this case is thus closely related to that obtained 
in the treatment of the regression problem with one dependent variate in 
section 2. Applying the argument used there, we can obtain: 

Theorem IVa. The likelihood ratio test for the hypothesis that in a population 
of the type (7.1) the variates x' are independent of x m — the case m = t + ? + 1 
of Theorem IV—is completely unbiased. 

If we specialize the problem somewhat further, considering the case q = 0, 
Xa = 1 (so that m = t -f 1), we find that the likelihood ratio takes the form 

\ !/w — ^ 1 
' 1 + Nvifix’’ 1 + T’ 

AT 

where v li = 2 (x« — £')(xi ~ and T is Hotelling’s generalization [5] 

O-I 

of Student’s ratio. In this case we are testing the hypothesis that the x' are 
distributed with zero means. The exact distribution law of 

m = 1 -X W 
X !/w 

was recently published by P. L. Hsu [6], who obtained it in a very elegant 
fashion by means of the Laplace transform. He has also shown that the re¬ 
sulting test is most powerful in the sense that, of all critical regions S for which 

P[iCS)=e| + R(b) 

(where « and a are independent of the B' 3 , and of the means bi , and R is an 
infinitesimal of at least the third order as all bi tend to zero), the critical region 
defined by 

S: T > T, 

has the largest possible value of a. Tang’s tables [11] make it evident that 
this largest possible value of a is actually positive and that the test is in fact 
unbiased for all values of the 6’s when t — .05 or t — .01. The results of this 
section may be used to show that this property extends to all probability levels 
other than e = 0 and e = 1. 

The application of Hotelling’s T is by no means confined to the above case. 
Other hypotheses which can be tested by means of this statistic are discussed 
by Hsu [6]. In addition it is now known that the Studentized Z> 2 , devised 
by Mahalanobis for measuring the “distance” between two normal multi¬ 
variate populations, is proportional to Hotelling’s T. This fact is pointed out 
by R. C. Bose and N. Roy [1], who have obtained the exact distribution of D 2 
for the case in which the two populations from which the samples are drawn 
are assumed to have the same matrix of variances and covariances, but are 
allowed to have different sets of means; their work, however, is quite independent 
of Hsu’s. They also note that D* is proportional to the ratio which arises in 
Fisher’s method of multiple measurements [4], 



LIKELIHOOD-RATIO TESTS FOR INDEPENDENCE 31 

8. Summary. The method of likelihood-ratios is of practical as well as theo¬ 
retical importance, because it provides a unified approach to the problem of 
testing statistical hypotheses. In this paper we have investigated many of the 
tests which this method yields when applied to hypotheses about sets of re¬ 
gression coefficients and covariances in normal populations. By studying the 
probability functions of the corresponding X-criteria we are able to show that 
these tests are “good,” in the sense that they are unbiased even for small samples. 

Among the completely unbiased tests which can be based on the likelihood- 
ratio method, our discussion includes: the multiple correlation coefficient, with 
or without fixed variates [13]; Hotelling’s generalized T test [6] and the sta¬ 
tistically equivalent “Studentized D*” [1]; the ordinary analysis of variance 
and covariance for orthogonal or non-orthogonal data [11,16], as well as related 
tests of linear hypotheses in the case of one chance variable. 

With respect to the analysis of variance for two or more variables [15] and 
certain other hypotheses regarding regression coefficients in multivariate popu¬ 
lations, though there are indications that the tests are completely unbiased, we 
have succeeded in demonstrating this property only in the local sense. 

Finally, the likelihood-ratio test for the hypothesis that the variates fall into 
certain specified mutually independent sets [14] is shown to be unbiased, at 
least locally, and has the additional property described in Theorem Ilia. 

In conclusion, much more than a word of acknowledgment is due to Professor 
S. S. Wilks of Princeton University, to whom the writer is greatly indebted for 
advice and encouragement. 


REFERENCES 

|1] R. C. Bosk and N. Roy, “The Distribution of the Studentized D^statistic,” Sankhyd, 
Vol. 4 (1938-39), pp. 19-38. 

[2] G. W. Brown, “On the Power of the L\ Test for Equality of Several Variances,” 

Annals of Math. Stat., Vol. 10 (1939), p. 124. 

[3] R. A. Fisher, “The General Sampling Distribution of the Multiple Correlation Coeffi¬ 

cient,” Proc. Roy. Soc. Lond., series A, Vol. 121 (1928), pp. 196-203. 

[4] R. A. Fisheb, “The Use of Multiple Measurements in Taxonomic Problems,” Annals 

of Eugenics, Vol. 7 (1936), pp. 179-188. 

[5] H. Hotelling, “The Generalization of Student’s Ratio,” Annals of Math. Stat., 

Vol. 2 (1931), pp. 369-378. 

[6J P. L. Hsu, “Notes on Hotelling’s Generalized 7’,” Annals of Math. Stat., Vol. 9 (1938), 
pp. 231-243. 

[7] S. Kolodzieczyk, “On an Important Class of Statistical Hypotheses,” Biometrika, 

Vol. 27 (1936), pp. 161-190. 

[8] J. Neyman and E. S. Pearson, “On the Use and Interpretation of Certain Test Criteria 

for Purposes of Statistical Inference,” Biometrika, Vol. 20* (1928), pp. 176-240. 

[9] J. Neyman and E. S. Pearson, “The Testing of Statistical Hypotheses in Relation 

to Probabilities a Priori,” Proc. Camb. Phil. Soc., Vol. 29 (1933), p. 493. 

[10] J. Neyman and E. S. Pearson, “Contributions to the Theory of Testing Statistical 

Hypotheses," Stat. Res. Mem., Vol. 1 (1936), pp. 1-37, Vol. 2 (1938), pp. 26-67. 

[11] P. C. Tang, “The Power Function of the Analysis of Variance Tests with Tables 

and Illustrations of Their Use,” Slat. Res. Mem., Vol. 2 (1988), pp. 126-149 + 
viii. 



32 


JOSEPH 7. DALY 


[12] S. S. Wilks, "Certain Generalizations in the Analysis of Variance," Biometrika, Vol. 

24 (1932), pp. 471-494. 

[13] S. S. Wilks, "Moment-generating Operators for Determinants of Product Moments 

in Samples from a Normal System,” Ann. of Math., Vol. 36 (1934), pp. 312-340. 

[14] S. S. Wilks, "On the Independence of k Sets of Normally ’Distributed Statistical 

Variables,” Economtrica, Vol. 3 (1936), pp. 309-326. 

[16] S. S. Wilks, “The Analysis of Variance for Two or More Variables,” Report of Third 
Annual Research Conference on Economics and Statistics, (1937), pp. 82-86. 

[16] S. S. Wilks, “The Analysis of Variance and Covariance in Non-Orthogonal Data,” 

Melron, Vol. XIII (1938), pp. 141-164. 

[17] J. Wishabt, “The Generalized Product Moment Distribution in a Normal Distribu¬ 

tion,” Biometrika, Vol. 20^ (1928), pp. 32-62. 

Princeton University, 

Princeton, N. J. 



STATISTICAL SEMINVARIANTS AND THEIR SETIMATES WITH 
PARTICULAR EMPHASIS ON THEIR RELATION TO 
ALGEBRAIC INVARIANTS 


By Paul L. Dressel 
TABLE OF CONTENTS 

PAGE 

Introduction. 33 

I. The Relation of the Algebraic Semin variant Theory to the Moment Functions of 
Statistics 

1. Definitions. 34 

2. Complete Systems of Seminvariants. 35 

3. The MacMahon Non-Unitary Symmetric Function Principle. 35 

4. A Third Complete System of Seminvariants. 36 

5. Linearly Independent Seminvariants. . 36 

6. The Roberts Theorem. 37 

7. Statistical Distributions Represented by Binary Forms. 37 

8. Three Systems of Statistical Seminvariants. 38 

9. Linearly Independent Statistical Seminvariants. 39 

10. Statistical Invariants. 39 

11. Seminvariants and Invariants of Samples. 40 

II. Estimates 

1. Power Product Seminvariants. 41 

2. Unbiased Estimates of Rational Integral Moment Functions. 42 

3. Computation of Estimates. 42 

4. The Dwyer Double Expansion Theorem. 43 

5. Estimates of all Seminvariants of Weight 2a 8. 44 

6. Computation Checks. 44 

7. Estimates as Sums of Simple Seminvariants. 50 

8. The Estimates of the kb . 53 

9. Other Simple Seminvariants which are Invariant under Estimate. 54 

10. Composite Seminvariants which are Invariant under Estimate. 54 

Conclusion. 56 

References. 57 


INTRODUCTION 

An important portion of algebraic invariant theory has been that devoted to a 
certain class of invariants called seminvariants, semi-invariants, or more rarely, 
half-invariants. Of these terms, “seminvariant” seems to be the one now 
commonly accepted. The same three terms have been applied at various times 
and by various writers to a system of moment functions of importance in sta¬ 
tistical theory. The statistician using these terms has frequently done so with 
an apology for appropriating a term of the algebraist. As a portion of this 
paper we shall show that the moment functions of this system are actually 
algebraic seminvariants, and that there are other systems of moment functions 
which are equally entitled to the name seminvariant. 




























34 


PAUL L, DRESSEL 


The study of the statistical seminvariants of a population leads naturally to 
consideration of the problem of obtaining from a sample unbiased estimates of 
the value of these seminvariants. Estimates of this kind have been defined 
and computed by previous authors, but no simple method of obtaining the 
estimates has been given. In this paper a simple procedure for calculation is 
given and it is furthermore demonstrated that these estimates form an important 
phase of statistical seminvariant theory. 

The system of notation used for moment functions is that of R. A. Fisher, 
although the actual letters used in representing particular moment functions are 
not altogether the same as those used by Fisher. In general, a moment function 
of the population has been indicated by a Greek letter, the corresponding sample 
moment function by the corresponding English letter and the estimate by the 
corresponding capital English letter. 

A list of references appears at the end of the paper. Each reference has been 
assigned a number and this number placed in square brackets is used in the body 
of the paper to indicate the reference. Pages of the reference are indicated by 
additional numbers inserted in the parentheses and separated from the reference 
number by a semicolon. 


I. THE RELATION OF THE ALGEBRAIC SEMINVARIANT THEORY TO THE MOMENT 

FUNCTIONS OF STATISTICS 


The purposes of this chapter arc: (1) to review briefly and give adequate 
references to certain important phases of algebraic seminvariant theory, (2) to 
apply this material to the moment functions of statistics. 

1. Definitions. Any function of the coefficients of the binary form 

(1) /=2 (”)o.X"-r f ao^O, 

which is invariant under the transformation 


( 2 ) 


X — -yi£ + ytri, Y = + fav, 


A = 


7i 

5i 


7* 

5* 


0 , 


is called an invariant of the form /. See Dickson [1; 31-36]. 

Any function of the coefficients of / which is invariant under the trans¬ 
formation 


(3) X = £ + yti, Y = v, 

is called a seminvariant of f. 

The two operators 

(4) 0=2 ioi-i-?-, O = 2 (» - * + 1 )<* r~~, 

• <— i oOi i— i ock-i 

are of fundamental importance in the theory of algebraic invariants and semin¬ 
variants and, indeed, invariants and seminvariants may be defined by means 



SEMINVARIANTS AND THEIR ESTIMATES 


35 


of these operators. A necessary and sufficient condition that an homogeneous 
isobaric function of the coefficients of / be an invariant is that it be annihilated 
by both 0 and O. See Elliott [2; 113, 124]. The necessary and sufficient 
condition that an homogeneous isobaric function of the coefficients of / be a 
seminvariant is that it be annihilated by Q. See Elliott [2; 127]. 

It should be noted that there is nothing in the definitions above which requires 
that invariants or seminvariants be integral, although usually only this type is 
discussed. In what follows we shall find it more profitable to discuss homoge¬ 
neous isobaric fractional seminvariants, the fractional quality resulting from 
the appearance of Oo in the denominator. 


2. Complete Systems of Seminvariants. By direct application of the trans¬ 
formation (3) to/ the system of seminvariants [1; 47] 


(5) 




Clr—i 

Oo 9 


r < n, 


is obtained. This system is a complete system, [2; 44, 205, 206], in the Bense 
that all other seminvariants fractional in ao and of degree 0 are expressible 
rationally and integrally in terms of this system. 

Other such systems can be defined. The system of minimum degree semin¬ 
variants, the seminvariants of even weight being of degree 2 and those of odd 
weight being of degree 3, has played an important role in the algebraic seminvari¬ 
ant theory. Elliott [2; 207-209] discusses this system and gives the general 
formula for the even weight seminvariants of the system. So far as the present 
writer has been able to discover the general formula for the odd weight semin¬ 
variants has never been published, although Hammond [3] may have obtained it. 
After some lengthy but not difficult computation the result has been obtained, 
so that the last mentioned system of seminvariants is completely defined by 


2 <-o \ * / a? 

( 6 ) cw . - ± (- i )<*■( * )-£±: 1 

<-o V t r/ t + r + 1 


Qr—i Qr+i+1 

al 


+ E (-D ,+I 
<-0 


( 2r\ OifliPtr-j 
.*/ Oo 


It is easily demonstrated that for each of the above seminvariants, and in 
fact for any seminvariant, the sum of the numerical coefficients is zero. Dickson 
[1; 55] gives a suggestion leading to a very simple proof. 


3. The MacMahon Non-Unitary Symmetric Function Principle. Denoting 

the roots of E (”) = 0 by a», a*, • • • a», the r-th power sum of 

i-t w 

these roots is defined by 

(7) «,= E«;. 



36 


PAUL L. DBESSEL 


The form / may be written II (X - atY). 

<-l 

By a result due to MacMahon [4; 131] the seminvariants of the form / are 
identical, except for numerical factors, with those symmetric functions of the 
roots of 

(g) g-SSjr'-o 

t-0 11 

which when expressed in terms of sums of powers of these roots do not con¬ 
tain 8 i . MacMahon called such symmetric functions “non-unitary.” 

As a result of this theorem, MacMahon was able to discuss the seminvariants 
of a binary form of infinite order by discussing the non-unitary symmetric 

functions of the roots of 23 .! Y* = 0. 

i-o z ! 


4. A Third Complete System of Seminvariants. By application of the result 
stated in the previous section, a third complete system of seminvariants can be 
immediately obtained. Obviously the power sums s r , r > 1, are independent 
of $i. By the Waring formula, Burnside and Pan ton [5; 91-92], if 


then 

(9) 

wherein 


Then for 


Zc,r = Co Ha-0^7) 


TillTj! • • • T«! Vo/ Vo/ 



) 



r 



»«■! 


t-0 


(-irVKp -1) i (-Y 1 C“T* * * • (~y 

(10) • (r - 1)!s r = 2-W _ W-Vo/ 


• • • jr B ! (21)'* • • • (n!)'* 


Placing B r = — (r — l)!s r the B’s form a complete system of seminvariants. 
This result has some interesting statistical connections which will be men¬ 
tioned later. 


5. linearly Independent Seminvariants. It follows from the MacMahon non- 
unitary symmetric function principle, or it can be proved easily in other ways, 
that the number of linearly independent seminvariants of a given weight r is 



SEMINVARIANTS AND THEIH ESTIMATES 


37 


equal to the number of partitions of r which contain no unit part. Furthermore 
we have at our disposal a simple method for obtaining a set of linearly inde¬ 
pendent seminvariants of any given weight. 

For many purposes the power product defined by Dwyer [6; 13] is more 
useful than the customary monomial symmetric function. The power product 
is defined by the right hand member and indicated by the left hand member of 

( 11 ) (qi ••• q r ) — 2 «<]«<! a <r> 

where, for convenience, 9i > qt > • ■ • > q r ■ The monomial symmetric func¬ 
tion which will be denoted by M (qi • • ■ q,) is related to the power product by 
the identity 

(12) *! • • • mqVqV • • • ?,") = (qVqP • • • q!‘), 

so that a distinction occurs only when there are repeated exponents in the 
summation of (11). 

If we desire a system of linearly independent seminvariants of weight 6, by 
the MacMahon principle we need only to compute the values of the power 
products (6), (42), (33), (222) in terms of the a’s. In a somewhat different 
form these will be presented later. 

6. The Roberts Theorem. Roberts, see [2; 231] and [5; 108], demonstrated 
the existence of a duality relationship between power sums, s’s, and coefficients, 
a’s such that corresponding to any seminvariant in terms of a’s there exists 
a seminvariant in terms of s’s obtained by replacing a< by s<. The proof con¬ 
sists of showing that, the annihilator for seminvariants in terms of power sums 
is identical in form with fl, a< being replaced by s<. 

As a result of this duality, each of the systems of seminvariants which have 
been obtained yields, upon replacement of a,- by s,, another system of semin¬ 
variants. In particular cases it may happen that the systems are identical 
when the identities connecting the a,- and s< are taken into consideration. 

We next wish to show that the systems of power sum seminvariants thus 
obtained either are identical with certain well known statistical moment func¬ 
tions or lead to new ones. 

7. Statistical Distributions Represented by Binary Forms. The fact that 
statistical distributions may be represented by polynomials has long been 
recognized by statisticians, see Thiele [7; 24-26] and Bertilsen [8]. Indeed it 
was this fact which led Thiele to the definition of the seminvariants now called 
by his name. If we have given n observations m , aj, • • • a, , form the poly¬ 
nomial. 

f - n - «> -1 (")£x~. 

4-1 4-OV/Oo 


(13) 



38 


PAUL L. DRESSEL 


F is not a binary form, but the seminvariant theory of binary forms is applicable 
since seminvariants are functions of the differences of the roots and are inde¬ 
pendent of the X and Y , which appear merely as convenient symbols to indicate 
the various terms of the algebraic form. 

For distributions containing an infinite number of items the form F is of 
infinite order, but discussion of its seminvariants may be carried on by use of 
the MacMahon principle given in section 3. 


8. Three Systems of Statistical Seminvariants. Before exhibiting some sys¬ 
tems of statistical seminvariants it may be well to consider the meaning of 
“statistical 8eminvariant, ,, for this phrase has been undefined. In fact the use 
of the phrase is merely a matter of convenience in that it emphasises the fact 
that seminvariant moment functions have not previously been regarded as 
algebraic seminvariants. As used here a statistical seminvariant is an algebraic 
seminvariant which has some application in statistical theory. 

The system of seminvariants (5) yields by application of the Roberts’ Theorem 
the well known system of statistical seminvariants usually called central mo- 
f s s 

ments. If ti r = - = ~ r , the general formula may be written 

W So 

(14) Mr = X ^M^ - Ml)'- 
The system of seminvariants (6) likewise leads to 

(15) «*■ - i (-!>"'(< * r ) < ^ H ! - 1 ^-W 

+ X ( — 1)* +1 (*) Ml Mi Mtr-i, 


a system which seems never to have been used by statisticians. 
The system (10) leads to the well known Thiele seminvariants 


(16) 


K m s (- irVKp- piurur ••• ti r 


ttiItts! 


7r f !(2!r* ... (r!)’ 


From sections 3 and 4 it is apparent that the general formula for the Thiele 
seminvariants is a special case of the Waring formula for power sums in terms 
of coefficients. It does not seem that this fact has been previously recognized. 
An equivalent way of stating this idea is to say that the Thiele seminvariant 
X r is, except for the factor — (r — 1)!, the sum of the r-th powers of the roots of 
the equations obtained by setting the moment generating function, 

M,{Y) - Sm<'£, 

i -0 l\ 


equal tp zero. 



SEMINVARIANTS AMD THEIR ESTIMATES 


It is of historical interest to note that MacMahon published his non-unitary 
function principle and the resulting set of seminvariants in 1884. Cayley [8] 
published an article in 1885 dealing with this same system. Roberts* Theorem 
having been known for some time (probably about 20 years), it seems probable 
that MacMahon and Cayley were aware of the Thiele seminvariants four to 
five years before Thiele’s definition [9] by an entirely different method. 

9. Linearly Independent Statistical Seminvariants. At the end of section 5 
a method was indicated whereby a complete set of linearly independent semin¬ 
variants of a given weight r could be obtained. It has been noted previously 
that the one part symmetric function s r or (r) leads to the Thiele seminvariant X r . 
As a further illustration consider the power product (22). From a table of 
symmetric functions we find that 

(• 22 ) = — _ 2 _ a, ° 1 4- 

’ 4!oo 3!ao ' 2!2!oo 

_ 2 /a 4 4a s aj , 3af\ 

“ 4! \Oo ~ If + a? / ’ 

and by the Roberts’ Theorem the statistical seminvariant 

(a t4 — 4/*g hi + 3ms 2 ) 

is obtained. In similar fashion a system of linearly independent seminvariants 
of weight =2 8 have been computed and are given in Table I. For the sake of 
brevity they are expressed in terms of central moments. Hence the degree, by 
which is meant the maximum degree in the n"s, is not apparent in the table. 
This definition of degree associates with the statistical seminvariant the degree 
(in the usual sense) of the corresponding homogeneous integral seminvariant. 

10. Statistical Invariants. If the transformation 

(17) x - £ + mki), y = mij 

is applied to the binary form / and, if, in particular 



one system of invariants of / under this transformation is found to be 
(18) D r = A r /Al r , r < n, 

where A, is defined in (5). By the Roberts Theorem we obtain the fact that 
the standard moment Hr/nt r is an invariant of / under this transformation. 
Thus the standard moments, or standard seminvariants in general, have also 
an algebraic connection. The effect of the transformation (17) on the roots of / 
is indicated by 

x — <*<y = { + nikri — mam ■> £ — m(a< — &)i>. 



40 


PAUL L. DBESSEL 


If m and k are defined as above, the result is the equivalent of measuring in 

standard units denoted by — . 

V Ms 

The system (18) is not a system of algebraic invariants, for algebraic invariants 
must be invariant under rotation, translation and change of scale, or stretching. 
The component parts of the above system are invariant only under the last two 


TABLE I 

Linearly Independent Seminvariants of Weight & 8 


Weight 

Degree 

Seminvariants 


Weight ! 

i 

Degree 

Semin¬ 

variants 


6 

6 

Me — 15mbm* — 1(W 4- 3(W 

0 

l 

*> 

4 

Mi 4” &M4M2 ~ 10W — 30W 

2 

2 

M2 

3 

Me — 15ai4M2 + 20W + 30m* 3 

3 

3 


2 

Me 4* 15m«M2 ~ 10m* 2 

4 

4 

M4 — 3m2* 

7 

7 

m? ~ 21mcM2 — 35m4M* 4- 56mW 

2 

M4 4- 3m* 2 

5 

M7 4* 9 MbM2 — 36m<m* ■— 90m*W 

5 

5 

Mb — 10m*M2 

4 

M7 — 21m«M2 4" 25m«m* 4- 30mW 

3 

Mb 4" 2m*M* 

3 

M7 4* 9m6M2 ~ 5mim* 



8 

8 

ms — 28m«M2 — 56msm* — 70m4 2 + 210m4M* 2 4~ 280m« 2 M2 ~ 105m* 4 

6 

ms 4- 14mam* — 56 mimi ~~ 35m4* 210mW 4* 14(Wm* 4“ 630m* 4 

5 

Me — 28m am a 4- 49 m sMi ~ 35m4 2 4- 420mW — 490m* 2 M2 — 630m* 4 

4 

Me — 28m aM2 “ 56 MbMt 4“ 105m4* — 420m 4M* 2 4" 560MaVs 4” 630 m* 4 

4 

Mb 4" 14m eM* 56 mbmb 4* 35m4 2 — 210m4M* 2 4" 140m**m* 

3 

mb ” 7m «m* 4- 49 m«m* — 35m4 s 4" 105m W “ 70mj*m* 

2 

mb 4* 28m*m* ~ 56mi**i 4“ 35m4* 


types of transformation. In statistics translation and change of scale ordinarily 
constitute the only desired transformations so that the standard seminvariants 

“fr, ; [r, ~j;, • • • might well be called statistical invariants. 

M2 A* 

11. Seminvariants and Invariants of Samples. Consideration of the defini¬ 
tion of seminvariants and invariants shows that: 

1. A seminvariant is a seminvariant not because it is a function of deviations 
from the mean, but because it is a function of the differences of the observations; 

2 . An invariant is an invariant not because it is a seminvariant divided by 
the standard deviation raised to the proper power, but because it is a ratio of 
two seminvariants which are of the same order in powers of the observations. 











SEMINVARIANTS 'AND THAiS" EstfflfAffi* 


41 


These facts are important from the statistics viewpoint because they show 
that seminvariants and invariants of samples are also seminvariants and invari¬ 
ants of the population from which the samples are drawn. 


n. ESTIMATES ■ 

1. Power Product Seminvariants. The Roberts Theorem set up a duality 
relationship between seminvariants expressed in terms of coefficients and semin¬ 
variants in terms of power sums. It can be shown that corresponding to each 
pair thus determined there exists a third seminvariant expressed in terms of 
power products. This leads to what may be called a triple system of semin¬ 
variants, the interrelationships being most apparent when all three seminvariants 
are expressed in terms of the notation defined by (11). The seminvariant 

flj Styai _|_ 2oj jj ecome8 no tation 

do CLo Qq 

(in) 3(H)(1) 2(i)* 

-j(3) n (2) n u* 

The corresponding power sum seminvariant Is 

(3) _ 3(2) (1 ) 2(1/ 
n n 2 n 8 ’ 

while the power product seminvariant just mentioned is 

(3) _ 3(21) 2 (111) 

n n (i) n (8) 


The value of the power product notation lies in the fact that the numerical 
coefficients of the three seminvariants are then identical, while this is not the 
case when monomial and elementary symmetric functions are used. 

Perhaps a few remarks are in order in regard to the proof of the relationship 
above expressed. The annihilator, corresponding to 0, for seminvariants in 
terms of roots is, see [2; 230-31], 


<-i da* 


It is easy to see that 


• * P* * f Pi lj • • P# *)j 


and also that, 


(pi'pi 1 pJX' O) ^ (n — p + l)(pi l . .» p^r 1 ) ^ (px ri - • • pjl I 1 ) 

ftW 7j(») —1) 


Since 


n [ (Pi) Ti( p«r ••• (?.);■ j. j. g ... ( P d w< -\p< -1)... (p.v, 



42 


fxtit, L bsassBL 


and 

(pi) Tl (pQ Tl ••• (p.-i)'- 1 (0 ) = nfari pt Yl ••• (p.- i) T - 1 = (p t ) ri ••• (p.—i)'- 1 
n' n* ” ’ * n*~ l ’ 

it becomes evident that corresponding to any power sum seminvariant there 
exists a power product seminvaiiant with the same numerical coefficients. The 
converse is also true. 


2. Unbiased Estimates of Rational Integral Moment Functions. If r repre¬ 
sents a population parameter, and if t represents such a function of n observa¬ 
tions that the expected value of t is equal to r; then t is said to be an unbiased 
estimate of r. See Tschuprow [11; 74-76], Bertilsen [8; 144], and Fisher [12]. 

Let (pip* • • • p.) denote a power product computed from a sample, the sample 
being from an infinite population. Then it is well known that 

B-Rpip* ••• p.)] _ ./ / / 

*|_ n ( *> J PpiPpt Pp'> 

n being the number of items in the sample. If ET 1 be interpreted as “unbiased 
estimate of,” the above relation may also be written 


(19) 




,/ 1 _ (P1P2 ••• P.) 


and it is seen at once that the power product seminvariants defined in section 1, 
if computed from a sample of n observations, are the unbiased estimates of the 
corresponding power sum seminvariants of the infinite population from which 
the sample is drawn. 

This provides an algebraic interpretation as well as a different approach to a 
topic which has already aroused considerable interest among statisticians. In 
1927 Bertilsen [8; 144] gave the estimates of the first four Thiele seminvariants 
of the population in terms of Thiele seminvariants of the sample. In 1929 
R. A. Fisher [12] also obtained these results and gave in addition the estimates 
of the fifth and sixth Thiele seminvariants. His results are in terms of sample 
moments. In 1937, P. S. Dwyer [13; 26] gave the estimates of the first five 
population central moments and indicated also means for obtaining the estimate 
of any rational integral isobaric moment function. 

In the remainder of this chapter 

(1) Dwyer's method will be extended and perhaps somewhat simplified, 

(2) certain properties of this type of estimate will be pointed out, 

(3) estimates of all seminvariants of weight g 8 will be made available. 


3. Computation of Estimates. From the relationship (19) it is possible to 
write down immediately in a simple, although not immediately useful, form the 
estimate of any rational integral moment function. Thus the fourth Thiele 
seminvariant A< is given by 

A* m Hi ~ 4/t|/*i — 3m»* + 12m*au* ~ 6mi\ 





8 EM INVARIANTS AND THEIR ESTIMATES 


43 


so that the estimate of A4 is 

, (4) _ 4(31) _ 3(22) 12(211) _ 6(1111) 

4 n n <!> n (2) n (8) n (i> 

Since power products are difficult to compute directly, it is necessary to 
express the estimates in terms of power sums. Dwyer [6; 30-33] gave a com¬ 
plete discussion of the problem of expanding power products in terms of power 
sums and also gave tables of power products in terms of power sums for 
weights £ 6. By use of (12) it is also possible to use tables giving monomial 
symmetric functions in terms of power sums. One table by J. R. Roe [14; 
plate 18] includes all cases of weight g 10. 

By use of such a table we find 

(31) = -(4) + (3)(1), 

(22) - -(4) + (2) (2), 

(211) = 2(4) - 2(3)(1) - (2)(2) + (2)(1) 2 , 

(1111) - -6(4) + 8(3)(1) + 3(2)(2) - 6(2)(1) 2 + (1)\ 

If these results are substituted in L t above and like terms are collected, it is 
found that 

n (4) L 4 = n\n + 1)(4) - 4n(n + 1)(3)(1) - 3n(n - 1)(2) J + 12n(2)(l) s - 6(1) 4 , 
a result which agrees with that given by R. A. Fisher [12]. 


4. The Dwyer Double Expansion Theorem. The Dwyer double expansion 
theorem, [6; 34] and [11; 37-39], states that if any isobaric sum of power products 
of weight r indicated by 


( 20 ) 


r! 


(?i0* 


(9<0' r, * - i! 




:.(?P 


ql 0 


be expanded in terms of power sums in a form indicated by 


( 21 ) 


rl 


1 too- - w«i... ••• w "’ 


then the coefficient a, of the power sum (r) is given by 

(p- 1)1 • 


( 22 ) 


Or = 2(—1)'~ J 


••• (p.»"*,! ••• t.I-* 

and that the coefficient a r of (n)(rt) • • • (r m ) is 


b P w \ • • • bp*, , 


(23) Or,...,, - Or,Or, ••• Or m . 

The barred product indicates a symbolic multiplication by suffixing of sub¬ 
scripts which is exemplified by 


OtOt •“ (bi — 3+ 26m) (bt — bn) = 6*s - &*u — 36»»i + 5bnu ~ 2bum <■ on. 



44 


PAUL L. DRESSEL 


The application of this theorem to the present problem eliminates the use of 
tables and permits the independent computation of the coefficient of any particu¬ 
lar products of power sums in the expansion in terms of power sums of any given 
estimate. The illustration given by Dwyer [13; 39, 40] exemplifies both of 
these points very well. 

5. Estimates of all Seminvariants of Weight g 8 . If the estimates of any 
complete system of seminvariants and all products of these seminvariants up 
to and including weight r are known, then the estimates of all seminvariants 
of weight ^ r are obtainable as a linear combination of these known estimates. 
For example, suppose that we know the estimates of all Thiele seminvariants 
of weight ^ 5 and wish to find the estimate of mb . Since mb = X 5 + IOX 3 X 2 , 

E~ l [n b ] = M h « 1 T l [X*] + 102T 1 [X 3 X 2 ] = U + IOL 32 . 

In table II are given the estimates of all Thiele seminvariants and all products 
of Thiele seminvariants of weight g 8 . From this table the expressions for L 6 
and L $2 arc obtained and, by taking the combination indicated above, it is 
seen that 

n (i) M b = (n 4 - 5ft 3 + 10ft 2 ) (5) - 5 (ft 3 - 5ft 2 + 10»)(4)(1) 

- 10(ft 2 - ft) (3) (2) + 10(ft 2 - 4ft + 8)(3)(1 ) 2 

+ 30(ft - 2 )( 2 ) 2 ( 1 ) - 10ft(2)(l ) 3 + 4(1) 5 , 

a result which checks with that given by Dwyer [13; 27]. In similar fashion 
the estimate of any other seminvariant of weight ^ 8 can be obtained by use 
of table II. 


6. Computation Checks. There are a number of checks which can be applied 
to the entries in table II. These may be of interest simply as properties of the 
estimates, and they may be of use in correcting errors which may possibly have 
crept into the tables. 

When any power product of more than one part is expanded into power 
sums, the sum of the numerical coefficients of the expansion is zero. To prove 
this we need only to consider a set of observations of which one observation is 
unity and the rest are all zero. Then any power product of two or more parts 
is necessarily zero and all power sums are equal to unity. Hence the initial 
statement of the paragraph follows immediately. 


From this fact it is apparent that the sum of the coefficients of L r is -. and 

ft 

the sum of the coefficients of L ri rj... r , is zero. Thus for L 4 we have 


ft 8 + ft 2 — 4(ri 2 + ft) — 3(ft 2 — ft) + 12ft — 6 __ 1 


ft 


( 4 ) 


ft 


, and for L w the sum of the 


coefficients is 


-4s [— v? + n + 4ft — 4 + ft 2 — 3ft + 3 — 2ft + 1] = 0. 

ft w 



TABLE II 

Estimates of All Thiele Seminvariants and Their Product* of Weight 5 8 

to = 4 n u) Li n m Ln w - 6 »<»£, »«>£» 

(4) n» + n« -(»» - n) (5) n* + 5n* -(»» -»») 



46 


(2)*(l)* 
































TABLE II —Continued 



46 


















TABLE II —Continued 



















60 


PAUL L. DRBSS8L 


A condition satisfied by the coefficients of any seminvariant is that their sum 
is equal to zero (See section 2). This provides another check on the entries of 
table II, although the seminvariant must be written in homogeneous form 
before the check is applied. Thus we may write 


r 

n 

n w _ 


(n + 1) W - 4(» + 1) 

n w 


-3(„-U^ + .2n<W ! 

n* n z 


(i) 4 i 
« 4 J' 


and the sum of coefficients is 


(n + 1) — 4 (n + 1) — 3 (n — 1) + 12n — 6n = 0. 

Several checks arise from the fact (sec section 6) that every seminvariant 
must be annihilated by the operator 

(24) O' = 

»-1 oSi 

Another check results from the discussion of the next section and is so apparent 
as to need no comment. 

All the checks mentioned in this section are applicable to the estimate of any 
seminvariant. 


7. Estimates as Stuns of Simple Seminvariants. A seminvariant such as L 4 
in which the coefficients of the m"s are functions of n will be called a composite 
seminvariant, while a seminvariant in which the coefficients of the m"s are 
purely numerical will be called simple. The fact that is to be established in 
this section is that every composite seminvariant is the sum of simple semin- 
variants. As an illustration consider L«. It is apparent that 


Lt — 


n 4 , , n* , 

„<«) 14 + n (4) K *» 


where k and fc 4 are seminvariants of the sample corresponding to X 4 and «r 4 . 
Both h and k t are simple seminvariants. 

That a composite seminvariant may always be expressed as a sum of simple 
seminvariants can be demonstrated by considering the effect of Q', (24), on a 
composite seminvariant. The coefficients are polynomials in n and are un¬ 
affected by the operator. The expression resulting from application of the 
operator can vanish only if the coefficient of n r vanishes for every r. Thus a 
composite seminvariant which has r different powers of n appearing in its coeffi¬ 
cients is expressible as the sum of r simple seminvariants, which are not neces¬ 
sarily distinct. Table III exhibits the estimates of Thiele seminvariants of 
weight ^ 6 as sums of simple seminvariants. 

Since the factors, appearing in front of each of the simple seminvariants in 
the expression resulting from breaking down a composite seminvariant, are of 









































TABLE III —Continued 


52 


PAUL L. DBE88EL 


T 

• 

£ 


<0 

1 

iO 

rH 

O 

rH 

1 








< 5 ! IS 

1 

rH 

O 

1 

O 

rH 

»0 

1 

10 

0 

rH 

1 

»o 





% 

$ 

£ 

CM 

CM 

7 

rH 

CM 

CO 

1 

05 

ss 

1 

8 

8 

rH 

1 



% 

§ 

£ 





rH 

1 

1 

CO 


CO 

I 

1 



£ 

§ 

£ 







rH 


CO 

1 

CO 

rH 

1 

\ 

1 

s| 8 

/'“S 

rH 

^ N 

g 8 

a - 
S 8 

sl * 

H 

/*“N 

5 £ 

rh 

/-■S 

tH 

/—N iq 
£ 

s 

w 1 

g| £ 

_ 

f U 

»(I)(8) 

M 

('—s 

rH 

•C £ 

CM 

w *• 

X « 

CM 

W 

! * t 

v to 

jh 1 £ 


5 

- 1 ^ 

9 

£ | 

CO 

1 

*0 

rH 

0 

rH 

1 








m 

1 

rH 

CO 

1 

CO 

CM 

CM 

rH 

a 

1 

CM 

rH 





* 

$ 

£ 

CM 

N 

T 

10 

rH 

1 

8 

3 

8 

l 

9 





*11 

1 

rH 

CO 

1 

CO 

1 

qo 

rH 

CM 

a 

1 

05 

2 

l 

05 



£ 

9 

£ 




rH 


co 

1 



05 

CM 

rH 

1 


4 

! 

g| C 

1 

y—s 

rH 

9i 

s 8 

S - 
S 8 

1 « 

SI 8 

N 

/> 

*• 

X £ 

'W 

✓—V 

rH 

' 'w' 

S £ 

CO 

gj s 

1 

« 

rH 

X £ 

1 ** 

v/ H 

X * 

CM 

H> 

/—■s 

X £ 

CM 

'w' 

1 

to | 

✓"'si to 

cl * 


m 

1 


CO 

1 

10 

rH 

O 

rH 

1 









Is 

s 

8 

r>* 

5? 

I 

9 

O 

rH 

1 

S 

5 

H 

1 

s 





m 

1 

rH 

CO 

1 

UO 

»Q 

O 

rH 

8 

1 

8 

8 

»© 

rH 

1 



%|S 

*1 £ 

1 

rH 

CO 

1 

»o 

rH 

1 


a 

s 

1 

8 

1 

Sm 



* 

s 

£ 



rH 


rH 

r 

1 

1 CO 

1 


UO 

rH 

00 

rH 

1 

CO 

4 

SI « 

rH 

W ^ 
/H C 

»o 

✓—V 

Si « 

3 8 

SJ.. 
©1 8 

M 

n « 

X £ 

w 

rH 

w 

s 

CO 

O 

» 

£ 

#• 1 

Si| e 

•* 

rH 

X £ 

CO 

V' 

« 

rH 

X * 

CM 

H) 

5 * 

CM 

w 

* 1 

''***' • 









8EMINVARIANT8 AND THEIR ESTIMATES 


63 


successively lower order with respect to »; it is possible to obtain approxima¬ 
tions of various orders to the value of an estimate by using the appropriate 
portion of the expression given in the table. 

8. The Estimates of Ihe k’s. The seminvariant x r possesses an interesting 
property which will be called invariance under estimate. By this is meant that 
the estimate of k, is k, multiplied by a suitable factor. In particular, k% = m and 
ki == mi and it is well known that 


rw 


mt, E -1 [ ah ] = ~t., tn* 


so that the *, certainly possesses the property for r = 2 and 3. It can be shown, 
however, that 


K ir = 4= K* 


= “<ij K »+'- 


From (15) 


so that 


1 vp> /2r\ t I 
= 2 


By the Binet-Waring identities [15; 6-7] 

(26) (0-6) = (e)(5) - (o + 5) 

and this holds for power products regardless of the values of a and 5. Hence 

n 2 m \ i / * (2) 

(2r) t _ 1 V (-1) ’ CO , 1 V f 2r \ «(2r - i) 

n L 2 « n-rj + 2!5 l ' \*/ »® * 

Since 

the coefficient of above is —^ and it follows immediately that 

n n — 1 

ir _ 1 vv_i'i* / 2r \ W(2r — *) _ n 1 
** 2<=T \t / n«> »<»**• 

This proves the first half of (25) and the second half can be proved in similar 
fashion, although with considerably more difficulty. 



54 


PAUL L. DBBSSBL 


9. Other Simple Seminvariants which are Invariant under Estimate. It has 

been previously remarked (Chapter I, section 2) that the * system of semin- 
variants are the seminvariants of minimum degree, those of even weight being of 
second degree and those of odd weight being of third degree. The **r’s are the 
only seminvariants of degree 2, but for odd weights greater than 7, there exist 
more than one seminvariant of degree 3. It is not difficult to show that these 
additional minimum degree seminvariants are also invariant under estimate. 
The type of proof used could have been applied equally well to obtain the results 
of the preceding section and indicates that the property of invariance under 
estimate which is possessed by the k's is a direct result of their minimum degree 
property. 

Consider the estimate in power product form of any seminvariant of degree 3 
and odd weight. Power products of 1,2 and 3 parts will appear. By the Binet- 
Waring identities each three part power product ( abc) yields a third degree power 
sum product (a)(6)(c) plus other products of lower degree. Since (a)(6)(c) 
comes only from (a6c) its coefficient must be identical with that of (a6c) and will 
therefore be a constant divided by n (3) . The coefficient of each second degree 
product of power sums will be a sum of terms, the first of which comes from the 
corresponding two part power product with a coefficient identical with that of the 
power product, and the others come from the three part power products. Then 
the coefficient of a second degree product of power sums must be of the form 

Ci , Cj + Cj + • • • + c t _ Cin + Cj 

n W "f flCM n <3> • 


Similarly the coefficient of the first degree power sum term will be of the form 

din 2 + din + d» 


Since the estimate of a seminvariant is a seminvariant, it follows that d 3 ■ 0. 

This is true because the coefficient of —--- must be the coefficient of — 

n s n 

multiplied by — r. Furthermore ci = di = 0 for if the contrary be assumed it is 

immediately possible to break the composite seminvariant into two simple 

seminvariants, the first being of degree 3 (the original seminvariant) and the 

second of degree 2. Since for odd weights no seminvariant of degree 2 exists, 

it follows that any seminvariant of degree 3 and odd weight is invariant under 

estimate. It is also apparent that the factor n*/n w must appear in the estimate. 


10. Composite Seminvariants which are Invariant under Estimate. For 

each weight r ^ 4 there exists a composite seminvariant which is invariant 
under estimate. For weights 4 and 5 this seminvariant is easily obtained by use 



SEMINVARIANTS AND THEIR ESTIMATES 


65 


of Table III. Thus for weight 4, form the seminvariant A 4 + c»X*. From the 
table we find that 

£_IfX4 + C ** X * J * U + * 4 + ** 5® ^ “ C ” (n-*2) < « * 4 

*= -^5 (It + CsjI*) + (n 2 — n™ Cn)ki. 

n f4) n r4) 

If c« « n J /n (2) the seminvariant is invariant under estimate. This seminvariant 
is 

(27) *i = X4 + 4 X >- 

In similar fashion we find for weight 5 

( 28 ) — Xi + - jjj A»X|. 

For weights > 5 considerably more difficulty is encountered. For weight 6 , 
for example, we consider the seminvariant 

2 S 

At + £42X4X2 Cm A3 4 * C222A2. 

By use of table 111 we obtain 

0 

ET l l\ t + 042X4X2 + Ci*Xj + C222X2] = j 7 6) (It + ctiliU + Cttll 4 - £222^2) + 4 *, 


where 4> is a sum of other seminvariants witli coefficients which are functions of 
n and C 42 , c», C 222 . Now there are only four linearly independent seminvariants 
of weight 6 and it is necessary that one of these involve the term (1 )*/«". By an 
argument analogous to that of the previous section this term cannot appear in 
4> and therefore 4> is expressible in terms of three or fewer seminvariants. Ac¬ 
tually three are necessary and equating the coefficients of these to zero the values 
of C 42 , c» and C 222 are uniquely determined. The result is somewhat lengthy 
and scarcely of sufficient interest to record here. 

The same sort of procedure can be used for determining seminvariants of 
higher order which are invariant under estimate, but the labor of computation 
becomes very great. 

It is possible to obtain moment functions which are invariant under estimate 
by means of a set of equations given by Dwyer [13; 38-39]. These equations 
connect the coefficients of a general isobaric moment function and the coefficients 
of the expected value of that function. In his notation if, for example, 

/ 4 = m(4) + 4o„(3)(l) + 3o«(2) J + 6o,u(2)(l)‘ + o ull (l) 4 , 

then 

E[ft] = b A nni + 4&nn (a) M*AU + 3622W (! V<* + 6&2iitt <8> MMi J n<4) 6nnMi 4 > 



56 . PATTL-L. DW5S&EL “ 


wherein: 


(29) 


04 + 4 o 81 + 3 a** + 60211 + ami = 64 > 

a 8 i + 3ajn + Omi = b 8 i , 

a*2 + 2a*n + Ann = 5 m t 

o*ii + ami = 6*n , 

aim == bun • 


The problem at hand demands that 


-[• 


jjii (4),, , (3) ( 1 ) 

& I na 4 -h 4n o 8 i —r— 

n n £ 


. (2)“ , n a (2)(1) 2 , 4 

+ on 0*2 + on o*n —— + n dim 


7r 


n 4 


a) 4- 

n 4 _ 


= X[na 4 /i4 + 4n (S) asiM3W + 3n' i) aant i + 67i <3) a 2 uM2Mi 2 + ft^OiuiMi 4 ] 


so that the equations (29) become 
n 4 a,m = X» (4> ojui , 
n 3 a jn = Xn (3> (ajn + aim), 

n s aj 2 = \n w (a n + 2 ajn + ami), 
n 2 a« = Xn (2, (a si + 3am + «uu)> 
na t = Xn(a 4 + 4aai + 3aj2 + 6aju + ami), 


and from these equations a 4 , a S i, a 22 , a 2 n can be found in terms of ami. Ob¬ 
viously there is only one solution if none of the a’s are zero. In general, for any 
weight r, a similar system of equations can be found and they determine the 
coefficients of a moment function of weight r which is invariant under estimate. 
It appears that this moment function Is always a seminvariant although no 
proof of the fact has been found. The moment functions of weight 4, 5 and 6 
obtained by this method are identical with fa , fa and fa defined above. 


Conclusion. The results of this paper include: 

1 . A demonstration of the fact that the theory of statistical seminvariants is 
identical with the theory of algebraic seminvariants. 

2 . The introduction of new statistical seminvariants. 

3. Simplification of the computation of estimates. 

4. Proof that the estimate of any seminvariant is also a seminvariant. 

5. Proof of the existence of a trio of seminvariants with the same numerical 
coefficients. 

6 . A discussion of seminvariants which are invariant under estimate. 

Many thanks are due Professor P. 8 . Dwyer for his able guidance in the 
preparation of this paper and to Professors C. C. Craig and J. A. Nyswander for 
helpful comments. 



SEMINVARIANTS AND THEIR ESTIMATES 


57 


REFERENCES 

[1] L. E. Dickson, Algebraic Invariant* (1914). 

[2] E. B. Elliott, An Introduction to the Algebra of Quanlice (1895). Reference is to the 

second edition (1918). 

[3] J. Hammond, “On the Solution of the Differential Equation of Sources, 11 Amer . Jour . 

of Math., Vol. 5 (1882), pp. 218-227. 

[4] P. A. MacMahon, “Seminvariants and Symmetric Functions/’ Amer. Jour, of Math., 

Vol. 0 (1884), pp. 131-103. 

[5] Burnside and Panton, Theory of Equation* (1881). Reference is to the 1904 edition, 

Vol. II. 

[0] P. S. Dwyer, “Combined Expansions of Products of Symmetric Power Sums and 
Sums of Symmetric Power Products with Applications to Sampling.” Part I. 
Annals of Math . Stat. f Vol. IX, 1, (1938), pp. 1-47. Part II, Vol. IX, 2, (1938), 
pp. 97-132. 

[7] T. N. Thiele, Theory of Observation* (1903). 

[8] N. P. Bertilsen, “On the Compatibility of Frequency Constants and on Presumptive 

Laws of Errors.” Skandinavisk Aktuarietidskrift , Vol. 10 (1927), pp. 129-150. 

[9] A. Cayley, “A Memoir on Seminvariants.” Amer. Jour, of Math. Vol. 7 (1885), 

pp. 1-25. 

[10] T. N. Thiele, Almindelig Iagttagelseslaere (1889). 

[11] A. A. Tschuprow, Grundbegriffe und Qrundproblemc der Korrelationstheorie (1925). 

[12] R. A. Fisher, “Moments and Product Moments of Sampling Distributions.” Proc. 

London Math. Soc. f Vol. 2 (30), (1929), pp. 199-238. 

[13] P. S. Dwyer, “Moments of Any Rational Integral Isobaric Sample Moment Function.” 

Annals of Math. Stat., Vol. 8 (1937), pp. 21-65. 

[14] J. R. Roe, “Interfunctional Expressibility Tables of Symmetric Functions.” Dis¬ 

tributed by Syracuse University (1931). 

[15] FaX db Bruno, Theorie des Formes Binaires (1870). 

University of Michigan, 

Ann Arbor, Michigan. 



THE ERRORS INVOLVED IN EVALUATING CORRELATION 
DETERMINANTS 

By Paul G. Hoel 


1. Introduction. Many statistical problems require for their solution the 
evaluation of correlation determinants. The method usually employed for such 
evaluation is that of Chio, 1 in which the order of the determinant is reduced by 
successive operations with selected pivotal elements. The repeated multiplica¬ 
tions and subtractions involved in the method necessitate rounding off the 
elements in the successively reduced determinants. The calculated value of the 
original determinant is therefore in error; and so the question naturally arises 
as to the magnitude of this error. 

Previous attempts to answer this question seem to be satisfied with finding 
an upper bound for the magnitude of the difference between the value of the 
original determinant and its value after its elements have been rounded off. 
Moreover, this bound is expressed in terms of the errors in the elements and the 
minors of the original determinant, whose values are assumed to be known 
exactly from calculation. However, several reductions are often needed before 
the value of the determinant can be obtained; and furthermore the minors are 
subject to the same type of errors as the determinant itself. The problem, 
therefore, is to find an upper bound for the magnitude of the difference between 
the final calculated value of the determinant and the determinant itself which 
involves only calculated quantities. 

This paper treats the problem from two different points of view. In the first 
part an upper bound is obtained for the magnitude of the error. In the second 
part the first order error terms are given more detailed consideration, with the 
result that an upper probability bound is obtained for the error. 


2. Absolute Bounds. Consider the correlation determinant A = | |. To 

evaluate A by the method of Chio, it is convenient to select diagonal elements 
as pivots. It will be assumed without loss of generality that the upper left 
diagonal element is always chosen as the pivotal element in each reduction. 
After each reduction, elements are rounded off to a fixed decimal accuracy. 
Let a k ij represent the element i,j after the fc-th reduction, xlj the difference 
between the rounded value of element a*, and of/ itself. After k reductions, we 
arrive at the determinant 


CLk+\k+l + xjb+iJk+i 


k , h 
&nn + X n n 


1 See for example, Whittaker and Robinson Calculus of Observations, p. 71. 

58 



ERRORS IN EVALUATING CORRELATION DETERMINANTS 


59 


By treating F* as a function of the x k , it may be expanded by Taylor’s formula 
as follows: 


(1) F* * A* + 2^ x k ijA k ij + XijX k p<l A k i ip , + 

where A k is the value of F k for all x k zero, A k j is the cofactor of a*, in A k , etc. 

For a determinant of order n, the value of the determinant obtained after a 
single reduction is the value of the original determinant multiplied by the 
n — 2 power of the pivotal element used. Applying this to F*, it follows that 

A* = (ott 1 + a= Hr k ~ l F k ~ l 

A h u = 7 1 

a k _ rr»—k—3 jni-1 

Si i jpq — " ijpQ ) 


etc., where the exponents of Hk are ordinary exponents rather than notation. 
Substituting in (1), 


F* - Hr k ~'F*- 1 + Hi 


nHfc—2 




EE 


Jfc * 




In order to express F* in terms of the original determinant, this expansion 
will be condensed by means of the following operational notation. 

(2) F k = (1 + D + D* + • • • + D n ~ k )Hr k ' l F k ~\ 


where D' operates on Hi~ k ~ 1 F k ~ 1 by reducing the exponent of Hi~ k ~ l by i units, 
by summing from k + 1 to n the product of i terms in x k with the corresponding 
cofactors of F k ~\ and dividing the result by factorial i. Using this as a recursion 
formula, 


F* - (1 + D + 


However, 


+ D n ~ k )Hk~ k ~ 1 (l + ■■■ + D n ~ k+i )HH k . . . 

(1+ ... + D n ~ 1 )Hr i F°. 


| Oil + Xu 


F° = | 


= A, 


0»» + Xnn 


since we assume that x it = 0 for our original determinant. Consequently, 

f* m (1 + ... + D n ~ k )HS~ k ~ l (1 + • • • + D n ~ k+1 )HHi • • • 

(3) (1 + • • • + Z) n-1 )/fr _J A. 

Since D* operates on F* -1 in (2) to extract the proper cofactor of i less rows than 
in F* -1 , which in turn reduces the exponent of all factors Hn-i in the expansion 
of F* -1 by i units, D' reduces the exponent of all H’b following it in the expansion 
of F* in (3) by i units. 



60 


PAUL Q. HOEL 


Following these rules of operation, and expanding so as to collect terms of the 
same degree in the x’s, we may write 


f* = nr* -1 • • • 
(4) 

Hr *.a + #r*~ 2 • • • 

Hr 1 (terms in x, # ) + 

H n-k-> 

• • ■ Hi~* (terms in x<yi M ) -f 

Letting H = H k Hk-\ ■ 

■ ■ Hi and C = 

• • • Hr\ we may write 

II 

o 

1 

II 

C (terms in x,y) + 

~ (terms in x*x M ) + • • • J; 

and hence 




(5) J — a = _ (terms in x (i ) + ^ (terms in x if x p ,) + • • • . 

Now J is the difference between the calculated value of A, using Ohio’s reduc¬ 
tion method and rounding off after each reduction, and the true value of A. 
We are interested in finding an upper bound for the magnitude of J. To ac¬ 
complish this we shall first overestimate the number of terms in the various 
sums of (5), then find an upper bound for the magnitude of the terms in these 
sums, and finally combine the two results. 

In counting terms by means of (3), we may ignore the H’ s since they merely 
serve as coefficients of the x’s. Therefore consider the nature of the terms in 

(! + •••+ £>"“*)( 1 + • • • + D n ~ k+1 ) ...(! + ...+ Z) n-1 )A. 


Now (1 + • • • + D’)A contains the sums 2 *»/Aiy, — ZE XxjXpq &ijpq j etc. j 

^ n —«+l It ! n— #4-1 

hence it contains i terms in x,-,-, s—— terms in x,-,x P([ , etc. Each of these 

tb 


is not greater than s 2 , «iC*, etc.; consequently, the number of terms of each type 
is not greater than the coefficient of the corresponding power of D in the expan¬ 
sion of (I + />)'*. Therefore, 


(6) (1 + /)) (n ~*>*(] + Z)) (n ~ m>! ... (1 + = (1 + D) m , 


where m = (n — k) 2 + • • • + (n — 1) ! , contains at least as many terms of each 
type as are found in the expansion of F k . This gives us the desired overestimate 
of the number of terms in the various sums of (5). 

In finding upper bounds for the magnitudes of terms, it is to be noted that (4) 
is written with all common factors extracted from each set of terms of the same 
degree in the z’s. In the parenthesis containing terms consisting of the product 
of r x’a, the first sum will have unity for its coefficient while the last sum will have 
Hlfth-x • • • H\ as coefficient, with all sums between having as coefficients prod¬ 
ucts of H’a with exponents < r. Hence an upper bound for all coefficients in 
this parenthesis may be written as R’, where R is the magnitude of the product 
of those H’a whose magnitude is greater than unity, but unity if none exceeds 



ERRORS IN EVALUATING CORRELATION DETERMINANTS 


61 


unity. Now terms in Xu are multiplied by A< } -, those in by , etc.; 
therefore let A,-,-, A (JP «, etc., be the absolute values of the largest in magnitude 
of such cofactors. With this notation for upper bounds for magnitudes of 
terms, and (6) giving an upper bound for the number of terms, we may write an 
upper bound for the magnitude of J as follows: 

(7) |J| < (f«),AA,, + + . • • , 

where c > | x | is the maximum error of rounding. This result is valid for any 
determinant with real elements. All quantities on the right are available from 
calculations except the A; consequently this upper bound will be useful only if 
satisfactory bounds exist for the minors of the determinant. It can be shown 
that (7) holds for any minor of A, say A„„, if the A have uv added as subscripts; 
and therefore it may be applied to the question of the accuracy of least square 
solutions. 

For the correlation determinant A it can be shown that the magnitude of a 
k is bounded by k\/2 ik for k even and fc!/2 i( * _1) for k odd. 

substituting these bounds in (7), 


minor of order n — 
a 

Setting a = jjt and 

il 


| J | < am + a 2 m Ct ~ + o 8 «Cs ^ + a 4 m C\ ~ + • • • 


2 2 8 3 4 4 

^ .am .am .am . 

< am+ _ + __ + ^ r + 


( 8 ) 


2 2 8 8 
. .am , am 

5 “ m + -T + 2li 


for am < 1. Since am is obtainable from the calculations for A, this is the 
desired upper bound for the error in question. 


3. Probability Bounds. In order to find probability bounds for this error, 
it will be necessary to expand the //'s since they involve the variables x . Con¬ 
sider H k = diik 1 + xm 1 • Since a** 1 came from repeated reductions of A, it is 
expressible in terms of the z ’s and the minors of A. To obtain this expansion of 
Hk consider 


G* = 


a!L*+u-«+i + xfcl 9 a +ik-*+i 


au + **» 


Using the same methods as for F k , this may be written as 



62 


PAUL 0. HOEL 


where b‘ is the value of G‘ for all a?* - ' zero, etc., and where B‘ = Hh^G ,+1 
B‘j « HkZ]G'+ l , etc. Substituting, 


g‘ - + hj-j e a*r(?*r + i mz\ ee + • ■ 


Using operational notation here also, this may be written as 

G a - (1 + E + E 2 + • -. + E')H'£\G*\ 

where the E’s operate the same as the D f s, except that sums are taken from 
A; — s + 1 to k rather than from n — s + 1 to n. Treating this as a recursion 
formula, 

H„ - G 1 = (1 + E)HU( 1 + £ + ... (l + ... + 

However, 



Oil + Xu 


an 

G t = 

• 

• 

Qkk + %kk 

sss 

dkk 


Consequently, 

(9) - (1 + + £ + £ 2 )tfL» ... (1 + ... + Ak. 


Since the U's operate on the following H\ s to reduce their exponents, the number 
of terms of various types, that is, of various degrees in the x 1 s, will not be de¬ 
creased if the order of H ’s Is disregarded and their exponents held fixed. There¬ 
fore consider 

(10) Hi - (1 + *)(1 + £ + £ 2 ) • • • (1 + • • • + E^AkHU • • ■ Ht* 


as an ordinary recursion formula in the H ’s for overestimating the number of 
terms of various types. If (10) is substituted for successive H } s within itself 
in a systematic manner until no H’a remain, it will be found that 

Hk = (1 + E) • • • (1 + • * • + E k X )A* 

(U; [(1 + *) • • • (1 + • • • + E k - % )Ak- i f •••[(!+ E)Ai[ ii ~ t [Ai]**~*. 

To merely count terms it is permissible to combine like terms to give 

H'k - (1 + je) s+ * + * ,+ -+**- 4 (i + E + £*)*+*+**+••+**-• ... (1 + ... + E k -')K 

- (1 + Ef~\l + E + eY~* E^K, 

where K is the product of the A’a. Since the E’a operate like the D’a, the same 
arguments as those used to arrive at (6) may be used to replace (1 -f E + ■ • • 
+ E“) by (1 + E) ,% for overestimating the number of terms. Hence, the number 
of terms of various types in Hk is not greater than those in 

(1 + . . • (1 + + E ) <k ~ v ' - (1 + E ) mt , 



ERRORS IN EVALUATING CORRELATION DETERMINANTS 


63 


where u>* = 2* ' + 2*-2*" 4 + • • • + (k — 2)*<2 # + (k — 1)*. Therefore the 
number of terms of various types in H!~ k ~ 1 • • • H*~* is not greater than in 




It is easily shown that t can be condensed into the form 

(13) t « [2*~ a (n - fc)-l] + i f[2^\n - *)-l] + -.. +(Jfe - l) a [2°(n - *) —1]. 

From (3) it is evident that the number of terms of various types in F k will not 
be greater than those in the expansion of F k when the exponents of the H’a 
are held fixed. But from (6) we have an upper bound for the number of terms 
arising from the D’ s, and from (12) those arising from the H’a; hence the number 
of terms in question will certainly be bounded by those in 


(14) 


(i + z)) m+t - a + Dy. 


Now consider the magnitude of terms. The terms arising from the operation 
of D ’s contain minors of A as factors, while those arising from the operation 
of E ’s contain minors of A,, where i ranges from 1 to k. Let A{y, etc., denote 
an upper bound for the magnitudes of all such minors of the same number of 
subscripts. It is easily shown that A' with 2r subscripts is not less than the 
magnitude of the product of several minors whose subscripts total 2r in number. 
The terms of various types also contain as factors products of the constant 
terms in the H’ s. The constant term in Hk , which will be denoted by hk , 
can be obtained from (11) by operating with all ones since it will be unaffected 
by disregarding the order of operation. Hence, 

hk * A*Aft-aA*_3 • • • A* A a 


Since the A, are principal minors of a positive definite determinant with no 
element greater than unity, h k has unity for an upper bound. Thus, an upper 
bound for the magnitude of any term in the product of i x’s will be t* times A' 
with 2 i subscripts. 

With upper bounds now available for the number of terms and the magni¬ 
tudes of terms, we are in a position to consider the complete expansion of I in 
which the coefficients of the z’s will be constants rather than H’a . Evidently 
the terms in Xu will come from the terms in Xu of (4) with the H’a replaced by 
the constant terms in their expansions. If Z denotes these terms, then 

Z = + hk 2 

(IS) 1 ‘ . . , 

+ • • • + hh • • • fh 2 


Now consider an upper bound for | / — Z |. Since I — Z involves only terms 
in the product of two or more x’s, we need consider an upper bound for such 
terms only. From the results of the two preceding paragraphs, we obtain 

\ I — Z \ < V„C*Af,p, + 4- ••• • 



64 


PAUL G. HOEL 


But from the paragraph containing (8), bounds arc available for the A'; hence 


\I - Z I < + e\C s + • • • 

t > 18 

< «_£ + fL „ = $ 

- 2 ' 2(1 — «/») ’ 


for tp < 1. Since Z is of order «, $ will ordinarily be small compared with Z; 

therefore consider the nature of the distribution of Z. 

If we write Z = aiXi + ■ • • + , then, since the x’s are independently 

distributed with rectangular distributions, it is easily shown that m* = 
2 

k S a o “J “ “4 = 3 — £ 2 °</Q2 a ?) 2 ' If the a< are approximately equal 

u 

in magnitude, then a< is approximately equal to 3 — 1/p. But from (15) 
V > i( n — ft) 2 + • • • + \{n — l) 2 , which is sufficiently large for determinants 
employing Ohio’s method to justify the assumption that Z is approximately 
normally distributed. Setting L — A*~ t_2 • • • /i" -3 , 



< “ [(n — ft ) 2 + • • • + (n — l ) 2 — <j{(n — ft) + • • • + (n — l) 2 }] 


< 2 ^(n - ft) 2 + •.. + (n - l) 2 - ~(2n - ft - 1)] = * 2 . 

Hence, the probability is > .95 that | Z | < 24'. Since 1 1 — Z \ < the 
probability is >.95 that | / | < 2V + 4>; and therefore the probability is >.95 
that 


(16) 


U\< 


2* + <f> 
C 


This inequality will usually give a smaller bound for | J | than (8). How¬ 
ever, when A is small the H’s may be small, with the result that C will be small 
and (16) may not give a satisfactory bound for | J |. In such cases the bound 
given by (8) may not prove satisfactory either. 


4. Example. Consider a correlation determinant of order 7 in which the 
elements are accurate to 4 decimal places. If Ohio’s reduction method is 
applied until a 2 rowed determinan t is o btained, then n = 7, ft = 5, t = .00005, 
m — 90, p = 176, ¥ = .00005\/l60/3, and we obtain from (8) that 


m< 


(»).0°45 + 



.00001 + 


7?Y _ 00000005 
fl) i - mWff/H 



ERRORS IN EVALUATING CORRELATION DETERMINANTS 


65 


where fi/H is obtained from calculations involved in evaluating the deter¬ 
minant. From (16) we obtain that the probability is >.95 that 


m< 


.0008 

C ' 


The relative advantage of the second inequality over the first depends on the 
size of the pivotal elements, as does the usefulness of cither inequality. 

University of California at Los Angeles 



THE CUMULATIVE NUMBERS AND THEIR POLYNOMIALS 

By P. S. Dwyer 

In a recent paper [1] the author has shown how the moments of a distribution 
can be obtained from the last entries of cumulative columns with the use of 
multiplication by certain numbers. These numbers maybe called “cumulative 
numbers.” It is the aim of this paper to show how these numbers can be 
obtained from the expansion of x* in terms of factorials of the s-th order and to 
demonstrate properties of the polynomials of which these numbers are the co¬ 
efficients. 

TABLE 1 


Successive Frequency Cumulations 


Cl) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

X 

X 

/* 

C> 

O 

C‘ 

C* 

C‘ 

a + 6 

6 

64 

64 

64 

64 

64 

64 

a + 5 

5 

192 

256 

320 

384 

448 

512 

a + 4 

4 

240 

496 

816 

1200 

1648 

2160 

a + 3 

3 

160 

656 

1472 

2672 

4320 

6480 

a + 2 

2 

60 

716 

2188 

4860 

9180 

15660 

a + 1 

1 

12 

728 

2916 

7776 

16956 

32616 

a 

0 

1 

729 

3645 

11421 

28377 

60993 


1. The values C\(u x ). We use the notation Ci(u x ) of the previous paper 
[1,289] to express the columnar cumulated entries. The j indicates the order 
of the cumulation while the i indicates the number of the term, counting from 
the bottom of the column. Thus in Table I, which presents the cumulations 
of a frequency distribution used in the previous paper [1,289], C\ = 729; C\ = 
3645; Cl = 2916; • • • , C\ = 6480, etc. Now if k + 1 values of x are spaced at 
unit distances and if the smallest value of x is 0, it can be shown that 


k k 

Cl = Iu.; C\ = Z (* + Dm* 


_ v + 2 )(oj + 1) . 

Ci — 2J - —o i- m* i 



and, in general, j > 0 and j + 1 > i, 


ct +l = £ ( *±A±' 


66 



THE CUMULATIVE NUMBERS AND THEIR POLYNOMIALS 


67 


Similarly if k values of x are spaced at unit distances and if the smallest value 
of x is 1, it can be shown that 


C\ = E u, ; C? = E *«*; 
1 1 


1 1 Z\ 


r* _ V (* — 1)(* — 2) 

C.-2,-2t-“■ 


and, in general, j > 0 and j + 1 > i, 


( 2 ) 


cr - e 


(*+j - *) 


O) 


jl 


u x . 


It is to be noted that the coefficients of u, in (2) could be obtained from the 
coefficients of u x in (1) by the substitution x + 1 = x'. 


2. The powers in terms of factorials of the s-th order. If the s-th powers can 
be expressed in terms of factorials of the s-th order (factorials having s factors) 
then the moments can be expressed in terms of the cumulations. For example 

2 (x + l)x + x(x — 1) , 

x =-—— - 80, from (1) 

z 

k k 1\< 2 > k 

E »-/. = E /. + E ^/. - cl + cl. 

o o Z! o ZI 


And since 


Z *7* = Z 


^.8 _ 


(x + 2) w + 4(x + l) w + x 


( 8 ) 


.(») 


3! 


, we have 


(x + 2) ( *> a + 4 £(x + 1) (,) , . ^x (W 


/. + Z = Ct+ 4 C\ + Cl. 

0 ol 


0 3! 

In general if 

A,\(x + s — 1) (,) + A„a(x + 8 — 2) (,) 

+ • • • + A„(x + s — j) w + ••• + A.,x (,) 


(3) x* = 
then 


s! 


(4) Z * 7 - = ^.icr 1 + a.,c; +1 + • • • + A.iC% 1 + • • • + A„cSi , 

0 

while if the smallest value of x is 1, we have 

( 5 ) Z*7. = a.,c; + 1 + A.,cj +1 + • • • + A„c; +1 + ... + a .. c : + \ 


These quantities, A ,,-, in (4) and (5) are simply the coefficients of certain fac¬ 
torials of the s-th order in the expansion of x'sl. 




68 


P. S. DWYER 


These numbers, for small values of 8 , are easily obtained. It is possible to 
use the table and a recursion formula of a previous paper [1,264-296] for larger 
values of 8 . It is also possible to obtain these values, without involving cumular 
tive theory, from (3) above. 

While doing this we make a more general approach by expanding (o + x)' 
in terms of these same factorials with the coefficients now functions of a. This 
is possible if we add an additional term, A.a{x + s) M , to the numerator of the 
right hand side of (3). We have then 

Aa(x + s) w + .4,1 (x + s — 1) (,) 

(6) (a + *)' + • • • + A,j(x + s - j) M H + A u x M 

8 \ 


The determination of the values ran be accomplished by purely algebraic 
means by successive substitution of x = 0, 1, 2, • • • 8. In this way we obtain 
8 + 1 equations in 8 + 1 unknowns. For example when s = 2 

(„ _i_ Awix + 2) <2> + An(x + 1) <2) + Amx (2) 
so that when x = 0, 1, 2, we have 

a 2 = Aw ; (a + l) 2 = 3^20 + An ; (a + 2) 2 = 6/1 20 + 3/1 21 + An . 

The solution is Aw = a 2 ) A21 = 2 ab + 1 ; ^4 22 = l 2 where 6 =* 1 — a. It 
follows that 

(a + x) 2 = a 2 —+ (2 ab + 1) — -h b 2 and hence that 

21 21 21 

L (a + x)% = a 2 Cl + (2ab + DCs + b 2 Cl, 

0 

as indicated in the previous paper [1,293]. 

When o = 0, then 6 = 1 and we have 

2 x*f x — Cl + Cl while when a = 1,6 = 0 and the right 
hand side becomes C\ + Cl. 

It follows that the general cumulative numbers might also be defined as the 
solutions of the s + 1 equations in the s + 1 unknowns obtained by placing 
x — 0, 1, 2, • • • , 8 in (6). 


3. Hie evaluation of the cumulative numbers. Formal algebraic methods of 
evaluating equations (6) are somewhat tedious so we use finite difference theory 
to aid in finding the solution. As in the previous paper [1] we use the notation 

_ , fv t when a < x < a + k\ 

V », = t»« - and otherwise j • We then wnte, from (6) 




THE CUMULATIVE NUMBERS AND THEIR POLYNOMIALS 


09 


s!(o + *)' - A M (x + «) w + A, i(x + s - 1) M 

(7) 

+ • • • + A.j (x + a - j) (,) + • • • + AjP. 

We note further that V* +1 (x + r) (,) = £ ~ We have then 

(8) V +1 (a +jy = A.,. 

It has been shown in the previous paper [1,292] that 

(9) v ^q+jT = £ (-1 )•(* | x )(o +i - ty 

and it appears that the cumulative numbers could be defined by (9). A useful 
recursion formula has been derived from (9) 

(10) V 4+1 (a + %Y = (a + x)V*(a + x) $ 1 +(s + l — a — x)V*(a + x — l) 4 l . 

4. The cumulative polynomials. We define the cumulative polynomials to 

be the polynomials obtained by using the cumulative numbers as coefficients. 
Thus when a = 0, 

Pi - y; Pi = y + y 2 ; P» - y + 4 y 2 + y 3 ; P 4 = y + lly 2 + 11 y % + V*; etc. 

It is possible to derive a recursion formula for these polynomials. We use 

(10) with s replaced by s + 1 and a — 0 and get 

(11) P . + 1 = 2V* +s (£TV = 2xV' + Hxy y z + x(s + 2- x)V‘ +1 (x -1)y , 
which becomes, after some manipulation, 

(12) P. + i = (1 - y)2xV , H (xyy* + (s + 1 )yP. . 

To illustrate we get P 4 from Ps = y + 4i/ 2 + !/*• Now 'SxV*(x) i y’ c — y + 
8 y 2 + 3j 2 and P 4 = (1 - y) {y + 8 y 2 + 3 y 3 ) + 4y(y + 4 y* + y*) = y + lly 2 + 
4y* + j/ 4 . The recursion formula (12) can be expressed also in the form of a 

differential equation, since P[ = ~ (P.) = 2xV* +i (t)V*~\ as 

Uj/ —“ 

(13) P^x = y[( 1 - j/)P: + (s + 1)PJ. 

It can be shown more generally that for any a 

Pa, 0 = 1; Pa. 1 = a + by; P a , * = a 2 + (2a6 + 1 )y + &V, etc. with 

(14) Pa,»+i = y( l — y)Pa,« + [o(i — y) + (« + i)y]P«.» 
as the recursion formula. 

5. The numerator coefficients in successive derivatives of the logistic function. 

Lotka has recently exhibited the coefficients of the numerator terms of sue- 




70 


P. S. DWYER 


cessive derivatives of the logistic function (2, 160]. These appear to be, aside 
from sign, the same as the cumulative numbers when a — 0. It is shown in 
this section that these numbers are the cumulative numbers. The scheme is 
generalized to include the numerator coefficients of the derivatives of a more 
general function involving the parameter o. 


Lotka used the function $o 


——, and obtained $i = 
1 + e r( 


re 


(1 + *•)•’ 


= 


rV‘(l - e Tt ) 
" (1 + e rt y 


, etc. 


The numerical coefficients are the same if r = 1 so we might 


as well use <f>o 


1 

1 + e*‘ 


A more general function is the two parameter function 


(15) 


^0.0 — 


1 + ce*' 


Let successive derivatives with respect to x be indicated by c ,i ; ; 4>a,c,s; 

etc. Then 


$a,c, 2 — 


_ e°*[o + c(l — o)e*] 

<P a .«.i - (1 + cgX y 

c [a* + (-2a 2 + 2a + l)ce* + (1 - a) 2 ^^] 
(1 + ce x y 


In general, 


e"Qa.«.. 




X\—9~l 


so that 

^ _ e ax {(l + ce*)[aQ a , c ,$ + Qa,c,*] — (s + 1 )ce x Q a , c ,*\ 

° ,e ’* +1 “ (1 + ce*)*+ 2 

and 

(16) Qa,c,t +1 = (1 + Ce*)[aQa,c,, + Qo.e,.] — (s + l)Cp r Qo,c,« . 

The Q functions can be changed to polynomials with the substitution c x = y. 
Then derivatives are taken with respect to y and 

(17) Pa,c:+ 1 = (1 + Cy)[aPa,e,. + yP'a.c,,] ~ (« + l)cj/P«.c., . 

When c = — 1, this becomes formula (14) and since P a ,t = 1, it follows that 
the numbers of the present section are generalized cumulative numbers. When 
c = 1 and a * Owe have the numbers found by Lotka. 

It can be shown, further, that the c coefficient of y’ is c’. It follows that the 
absolute values of the coefficients, when c = 1 and when c — —1, are the same. 


6. Formulas for 2a;'. A formula for the sums of the s-th powers of the 
integers from 1 to is obtained by summing (3). We get 





or 

rbfi%* + 

E c;"(i)v"yr. 

For example 

Y' ji _ (k + 2) (8) + (k + 1) <8> _ k(k + l)(2fc + 1) 
r ~ 3! 6 ’ 

V j* _ <* + 3) <4> + 4(jfc + 2) (4> + (Jb + 1) (4) _ k\k + l) 2 
i 4! .. 4 '• 

a-f-A 

More generally the values of 2 % can be evaluated by 

a 

n-¥k 1 t 9 

(21) £** = ,U2 (* + * - i) ( ‘ +,) V* +1 (a + i)‘ = ZC*ti(l)v* +1 (a + J)‘. 

7. Summary. It is shown how the cumulative numbers and the cumulative 
polynomials may be obtained in a variety of ways. Of special interest is the 
fact that the cumulative numbers can be obtained by expanding powers in 
terms of factorials and hence they might be called factorial coefficients of a 
kind. It is also possible, though it is not within the scope of this paper, to 
establish interesting relations between the cumulative numbers and the multi¬ 
nomial coefficients, the usual factorial coefficients, the difference of 0, etc. 

REFERENCES 

[11. P. S. Dwyer. “The Computation of Moments with the use of Cumulative Totals.* 1 
Annals of Math. Stat. Vol. IX, no. 4, Dec. 1938, pp. 288-304. A more extensive 
bibliography is available here. 

[2]. A. J. Lotka. “An Integral Equation in Population Analysis.** Annate of Math . 
Stat . Vol. X, no. 2, June 1939, pp. 144-161. 


r x ~ k " TR)! 


University of Michigan 
Ann Arbor, Michigan 




ENUMERATION AND CONSTRUCTION OF BALANCED INCOMPLETE 
BLOCK CONFIGURATIONS 1 

By Gertrude M. Cox 

1. Introduction. One of the general problems of experimental design is to 
avoid extraneous effects in making desired comparisons. The method employed 
is to use experimental materials as nearly homogeneous as possible. Such 
materials, however, are seldom available in large quantities. On the contrary, 
field soils vary in fertility from block to block, animals vary with both litter and 
sex, and leaves on one young plant differ from those on another. Differences 
between blocks, between litters and sex, and between plants, being irrelevant 
to the comparisons usually contemplated, must be avoided. 

When the number of treatments to be compared is small, well known methods 
of design/such as the Latin square or randomized complete block, are available 
and efficient. As the number of treatments increases, however, these designs 
tend to become less efficient through failure to eliminate heterogeneity. Fur¬ 
thermore, they become cumbersome, the Latin square design requiring replicates 
equal in number to the treatments and the complete block design providing that 
each treatment occur in every block. (Blocks are defined as an assemblage of 
experimental units chosen to be as nearly alike as possible.) 

Because of such limitations, several modifications of the complete block design 
have been devised. These new designs all have the common characteristic that 
the experimental material is divided into groups or blocks containing fewer units 
than the number of treatments to be compared. These more homogeneous 
small blocks are referred to as incomplete blocks. 

It is desirable to have all comparisons between pairs of treatments made with 
equal accuracy. This requires of the design that every pair of treatments 
occur in the same block an equal number of times. Such a design is referred to 
as balanced. Balanced incomplete block designs can be arranged (for any given 
number of treatments) only for certain combinations of block size and number of 
replications. 2 

The construction of balanced incomplete block designs is mathematically a 
part of the theory of configurations. A configuration is an assemblage of 
elements into sets, each element occurring in the same number of sets, and each 

1 A revision of an expository paper presented under a different title at a joint meeting 
of the Institute of Mathematical Statistics and Biometric Section of the American Statisti¬ 
cal Association, December 27, 1939. 

1 Numerous additional designs are available in the partially balanced incomplete blocks 
13). 


72 



INCOMPLETE BLOCK CONfidORATTONS 


73 


set containing the same number of elements. The configurations to be con¬ 
sidered here are the complete configurations, i.e., those in which each element 
occurs an equal number of times in the same set with every other element. It 
would be useful to know, (a) what configurations (within the useful range) 
exist, (b) how these configurations may be constructed. 

The typical requirement of the experimenter is this: “I wish to test t treat¬ 
ments and can use blocks of size k(t > k). I should like a design which will 
involve as little experimental material as feasible.” The designer must then 
determine what configuration of t elements in sets of k will satisfy the incidence 
relation that each pair of elements occur together in a set an equal number of 
times, and for which the total number of sets is a minimum. There are still 
many configurations which the experimenter needs but which have not as yet 
been constructed. 

In order better to explain the construction of these balanced incomplete block 
designs, it is essential to specify the underlying combinatorial problems. A 
configuration satisfying the condition of balance can be obtained by writing 
down all possible combinations, b, of the t elements taken k at a time, 

b = lCk = kW^.~k)V 

The simplest example is that in which each set contains only two elements and 
all possible combinations of the t elements, taken in pairs, appear in the different 
sets. This series of pairs can be written out by the experimenter, and the 
method of analysis is given by Yates [20]. 

Let us take another example; given six elements to be taken three at a time, 


b = «Cs 


6 L 

3! 3! 


= 20 . 


The 20 combinations are, 



134 

146 

236 

345 

m 

135 

166 

245 

346 

125 

136 

234 

246 

356 

126 

145 

235 

256 

456. 


Such unreduced designs are not necessarily economical or feasible in experimental 
work. It is often desirable to find some less extensive configuration. In this 
example half of the combinations, either those in italics or the other half, fulfill 
the restriction that every element occur with every other element in the same 
number of sets. Each pair of elements occurs twice in either group of sets. 
Thus, a balanced incomplete block design can be based on either half of the 
20 sets as well as on all 20. 


2. Combinatorial methods. Combinatorial considerations of a simple nature 
enable us to set up necessary conditions which balanced designs must satisfy. 



74 


GERTRUDE M. COX 


We have t elements arranged in b sets of k elements each; each element occurs in r 
sets, and each pair of elements occurs together in a set exactly X times. Then 
we must have 

tr — bk, r(k — 1) = X(f — 1). 

The first of these equations expresses the fact that the total number of plots 
must be equal both to the product of elements by replications and to the product 
of sets by number of elements per set; the second, that the number of pairs into 
which a given element enters must equal X times the remaining number of 
elements. 

It is convenient to write 

_ Xft - 1) . _ \t(t - 1) 

r k-l ’ k(k - 1)' 

Since the numbers t, 6, r, k f X must be integers, it is easy to obtain lower limits 
for any three in terms of the other two. 

To give a general classification, the configurations have been divided into 
classes according to the value of X. Because of the practical limitations in 
experimentation, table I has been expanded only to include X = 6 and the k 
values from 1-14. It may be well to call attention to the fact that duplications 
occur in the different classes of table I. For instance in the class, X = 1, for 
k = 6, t = 15m + 1, and m = 1, then 6 = 8, and r = 3. In order to construct a 
design, the following condition is necessary; r > k and therefore 6 > t. In this 
example, the condition is met if 6, r and X are multiplied by 2, the resulting design 
is t = 16, 6 = 16, r = 6, k = 6 and X = 2. This configuration is a duplicate 
of the design in the class, X = 2, for k = 6 and m = 1. In many of the con¬ 
figurations where X is 3, 4, 5, or 6, a common factor can be cancelled from 6, r and 
X giving a design listed in the classes , X = 1, 2 or 3. 

It should be emphasized that the conditions under which table I was derived 
are necessary, but not sufficient, for the existence of a complete configuration. 
For example, consider the following configurations which satisfy the necessary 
conditions for a design. 


Sub class 
(table I) 

m 

i 

b 

r 

k 

X 

10m + 5 

1 

15 

21 

7 

5 

2 

21m + 1 

1 

22 

22 

7 

7 

2 

15m + 6 

2 

36 

42 

7 

6 

1 

42m + 1 

1 

43 

43 

7 

7 

1 

45m + 10 

2 

100 

110 

11 

10 

1 

110m + 1 

1 

111 

111 

11 

11 

1 . 


No configurations of the above specification can actually be constructed. 

A selected group of configurations from table I is given in table II. Only 
those configurations whose k , r and X lie within practical limits, and whose 



TABLE I 










76 


GERTRUDE M. COX 


existence has not been disproved, have been included. The practical limits of 
k , r and X, of course, are dependent upon the conditions surrounding the experi¬ 
ment. We have chosen to keep k within the range 3 to 10 except for a few special 
configurations in which l is greater than 100, in which cases k was allowed to 
equal 11-14. Also r has been kept within a similar limited range. (Those 
configurations in table II, with an asterisk preceding t , have not been con¬ 
structed.) 

The above limitations upon k and r give a small, selected group of configura¬ 
tions. However, many others either have been constructed or are known to 
exist. For balanced incomplete block designs, Yates [20] gives the lower limits 
of r for t from 4 to 25 and k from 2 to 12 but not greater than \t. Fisher and 
Yates [8] have tabulated the configurations which are known to exist having 
ten or less replications including all arithmetically possible configurations the 
existence of which has not been disproved. 

Even if the existence of a configuration has not been disproved, there still 
remains the difficult problem of writing out the elements which are to appear in 
each set. Some discussion of the structure of such configurations is presented 
by Fisher and Yates [8] by Yates [20, 21] by Goulden [9, 10] and by Bose [4]. 
Additional descriptions are to follow. 

While a search of the literature revealed a number of constructed configura¬ 
tions, yet the general theory of their formation has received relatively little 
consideration. The question of combinations related to the theory of configura¬ 
tions which is of interest here was first set forth by Kirkman [11] in 1847. He 
states the problem thus: “If Q x denote the greatest number of triads that can be 
formed with x symbols, so that no duad shall be twice employed, then 

3Q* - x(x - l)/2 - V x 

if for V x we put 0, when x = 6 m + 1 or 6m + 3.” This gives the formula for b 
which was given earlier in this article. Put x = t and V x = 0 

h _ n _ t(t - 1) _ t(t - 1) 

3 2"' k(k - 1) ’ 

Besides the theory connected with these combinatorial problems, considerable 
information related to the construction of the configurations has been found in 
the literature on finite projective geometry, especially the geometry which applies 
to the theory of groups. 

An extensive discussion of the X = 1 class of configurations (as listed in table I) 
can be found in the literature.. The theory of the formation of the configurations 
for the sub-class l = 6m + 3 has been summarized by Ball [1]. This is the 
Kirkman “school-girl problem” for which Eckenstein [7] lists 48 papers and 5 
books written during the years 1847-1911 dealing with this subject. The 
problem was first published in the Lady's and Gentleman's Diary for 1850 [12]. 
It is usually stated that “a schoolmistress was in the habit of taking her girls 
for a daily walk. The girls were fifteen in number, and were arranged in five 
rows of three each, so that each girl might have two companions. The problem 



INCOMPLETE BLOCK CONFIGURATIONS 


77 


is to dispose of them so that for seven consecutive days no girl will walk with any 
of her school-fellows in any triplet more than once.” For this particular sub¬ 
class (t *= 6m + 3, k = 3), this type of configuration has been shown to exist 

TABLE II 

Selected Group of Configurations 


(Balanced Incomplete Block Designs) 


t 

b 

r 

k 

X 

t 

b 

r 

k 

X 

7 

7 

3 

3 

1 Y.S. 1 

*25 

50 

8 

4 

1 

7 

7 

4 

4 

2 

25 

30 

6 

5 

1 

8 

14 

7 

4 

3 

25 

15 + 15 

3 

5 

1 L.S. 

9 

12 

4 

3 

1 

*25 

25 

9 

9 

3 

9 

6 + 6 

2 

3 

1 L.S. 2 

28 

63 

9 

4 

1 

9 

18 

8 

4 

3 

28 

36 

9 

7 

2 

9 

18 

10 

5 

5 

*29 

29 

8 

8 

2 

9 

12 

8 

6 

5 

31 

31 

6 

6 

1 Y.S. 

10 

30 

9 

3 

2 

*31 

31 

10 

10 

3 

10 

15 

6 

4 

2 

*36 

45 

10 

8 

2 

10 

18 

9 

5 

4 

37 

37 

9 

9 

2 

10 

15 

9 

6 

5 

*41 

82 

10 

5 

1 

11 

11 

5 

5 

2 

*46 

69 

9 

6 

1 

11 

11 

6 

6 

3 

*46 

46 

10 

10 

2 

13 

26 

6 

3 

1 

49 

56 

8 

7 

1 

13 

13 

4 

4 

1 Y.S. 

49 

28 + 28 

4 

7 

1 L.S. 

13 

13 

9 

9 

6 

*51 

85 

10 

6 

1 

15 

35 

7 

3 

1 

57 

57 

8 

8 

1 Y.S. 

15 

15 

7 

7 

3 

64 

72 

9 

8 

1 

15 

15 

8 

8 

4 

64 

72 + 72 

9 

8 

2 L.S. 

16 

20 

5 

4 

1 

73 

73 

9 

9 

1 Y.S. 

16 

20 + 20 

5 

4 

2 L.S. 

81 

90 

10 

9 

1 

16 

16 

6 

6 

2 

81 

45 + 45 

5 

9 

1 L.S. 

16 

16 

10 

10 

6 

91 

91 

10 

10 

1 Y.S. 

19 

57 

9 

3 

1 

121 

132 

12 

11 

1 

19 

19 

9 

9 

4 

121 

66 + 66 

6 

11 

1 L.S. 

19 

19 

10 

10 

5 

133 

133 

12 

12 

1 Y.S. 

21 

70 

10 

3 

1 

169 

182 

14 

13 

1 

21 

21 

5 

5 

1 Y.S. 

169 

91+91 

7 

13 

1 L.S. 

*21 

28 

8 

6 

2 

183 

183 

14 

14 

1 Y.S. 

*21 

30 

10 

7 

3 







* Have not been constructed. 
1 Youden squares. 

* Lattice squares. 


for every possible value of t. Most of the solutions were worked by H. E. 
Dudeney and 0. Eckenstein. They are given by Ball [1] for all <’s less than 100, 
that is, for t = 9, 15, 21, 27, 33, 39, 45, 51, 57, 63, 69, 75, 81, 87, 93 and 99. 
Ball describes several methods of constructing such configurations, as cycles, 
combinations of cycles, scalene triangles inscribed in the circle, focal and analyti- 



78 


GBBTRUDE M. COX 


cal methods. As an illlustration of the school-girl problem, the construction 
of the configuration for t = 9, b = 12, r — 4, k = 3 and X = 1 will be shown. 
Scalene triangles are inscribed in a circle with certain specifications (to be 
fulfilled) giving the three sets of triplets for the first day as follows, 

Set Group I 

(1) k 1 5 

(2) 3 4 6 

(3) 7 8 2. 

By rotation or by cyclic substitution the other three groups are secured: 


Set 

Group II 


Group III 


Group IV 

(4) 

k 2 6 

(7) 

k 3 7 

(10) 

00 

(5) 

4 5 7 

(8) 

5 6 8 

(11) 

6 7 1 

(6) 

8 1 3, 

(9) 

1 2 4, 

(12) 

2 3 5. 


Then placing k = 9, we have the configuration for t = 9, b — 12, and r — 4. 
Note that in the school-girl problem the sets are grouped into complete replica¬ 
tions of the elements. This problem of 9 girls taken 3 at a time has been sub¬ 
jected to an exhaustive examination. There are 840 arrangements but only one 
fundamental solution. In the case of 15 girls, the number of fundamental 
solutions according to Mulden [14] and Cole [6], is seven. Ball mentions the 
Kirkman problem in quartets which is the sub-class t = 12m -+• 4, for k = 4. 
He states that this has been solved for cases where m does not exceed 49. He 
also states, “I conjecture that similar methods are applicable to corresponding 
problems about quintets, sextets, etc.” 

Before leaving the school-girl problem, an illustration will be given of t — 28, 
6 = 63, r = 9, k = 4 and X = 1. The following framework was set up by Dr. 
C. P. Winsor using suggestions from Netto [15]. 


#1 

0 « 

h 

U 


07 

bi 

bz 

08 

(iz 

Cl 

Cz 

04 

05 

Cl 

Cz 

b t 

&7 

Cz 

Cz 

bi 

bi 

Cl 

Cl. 


a, b and c each have every internal difference once and only once; and each pair 
a~b, a-c and b-c must have every external difference once and only once. The 
nine groups are given in table III. The cyclic substitution is within three sets, 
a, b And c. That is, 



INCOMPLETE BLOCK COHWOO&ATIONS 79 

in group I, a * 1, a x = 2, Oj = 3, • • • , o» « 9; # 

in group II, a - 2, Oi = 3, a* = 4, • • • , a» = 1; 

in group III, a = 3, Oi «■ 4, aj = 5, • • • , a 8 *= 2; 

etc. 

Netto [15] discusses < elements in sets of k, every set of 2 elements to occur 
together in a set exactly X times. He deals with X = 1, and gives a discussion 
of both sub-classes when k — 3, that is, for < = 6m + 1 and t — 6m -J- 3. Reiss 
[16] and Moore [13] have proved that configurations can be constructed for all 
values of t if k = 3. This is the type of information which is valuable in answer- 


TABLE III 

Configuration for t — 28, b ■■ 63, r ■■ 9, k •* 4, X ■> 1 

Group I Group II Group III Group IV 


k 

a 

b 

c 

28 

1 

10 

19 

28 

2 

11 

20 

28 

3 

12 

21 

28 

4 

13 

22 

ai 

at 

b. 

b. 

2 

9 

13 

16 

3 

1 

14 

17 

4 

2 

15 

18 

5 

3 

16 

10 

a 2 

a 7 

6. 

b, 

3 

8 

11 

18 

4 

9 

12 

10 

5 

1 

13 

11 

6 

2 

14 

12 

(it 

at 

Ci 

Cl 

4 

7 

23 

24 

5 

8 

24 

25 

6 

9 

25 

26 

7 

1 

26 

27 

a< 

a b 

Cl 

Cl 

5 

6 

20 

27 

6 

7 

21 

19 

7 

8 

22 

20 

8 

9 

23 

21 

b, 

b 7 

Cl 

Cl 

12 

17 

22 

25 

13 

18 

23 

26 

14 

10 

24 

27 

15 

11 

25 

19 

b< 

b> 

C% 

Cl 

14 

15 

21 

26 

15 

16 

22 

27 

16 

17 

23 

19 

17 

18 

24 

20 

Group V 


Group VI 


Group VII 

Group VIII 


Group IX 

28 

5 

14 

23 

28 

6 

15 

24 

28 

7 

16 

25 

28 

8 

17 

26 

28 

9 

18 

27 

6 

4 

17 

11 

7 

5 

18 

12 

8 

6 

10 

13 

9 

7 

11 

14 

1 

8 

12 

15 

7 

3 

15 

13 

8 

4 

16 

14 

9 

5 

17 

15 

1 

6 

18 

16 

2 

7 

10 

17 

8 

2 

27 

19 

9 

3 

19 

20 

1 

4 

20 

21 

2 

5 

21 

22 

3 

6 

22 

23 

9 

1 

24 

22 

1 

2 

25 

23 

2 

3 

26 

24 

3 

4 

27 

25 

4 

5 

19 

26 

16 

12 

26 

20 

17 

13 

27 

21 

18 

14 

19 

22 

10 

15 

20 

23 

11 

16 

21 

24 

18 

10 

25 

21 

10 

11 

26 

22 

11 

12 

27 

23 

12 

13 

19 

24 

13 

14 

20 

25 


ing the first question in the introduction of this article; “what configurations 
exist?” Carmichael [5] mentions the quadruple systems 6m + 2 and 6m + 4 
and states that the general problem of their existence appears not to have been 
solved. Also for the higher values of k there seems to be very little known of 
any generality, but it is known that for k > 3 there are certain configurations 
which are not possible. 

3. The method of geometrical configuration. Another aid in the construction 
of balanced incomplete block designs is found in some of the finite projective 
geometries. These are described by Carmichael [5], A tactical configuration 
of rank two is defined as a combination of l elements into m sets, each set con¬ 
taining X distinct elements, and each element occurring in p distinct sets, 









80 


GERTRUDE M. COX 


l = (t) — number of points in the geometry, 
m = (b) = number of lines, 

X = (k) = number of points, 
m = (r) — number of lines on a point. 

The series of finite projective geometries PG(s, p n ) for k > 1 furnishes a 
certain infinite class of these tactical configurations. The following list gives 
those which have been incorporated in the list (table II) of useful balanced 
incomplete block designs. 


Two dimensional space, PG(2, p”) 


pn 

IU) 

m(6) 

m 

M(r) 

2 

7 

7 

3 

3 

3 

13 

13 

4 

4 

2 2 

21 

21 

5 

5 

5 

31 

31 

6 

6 

7 

57 

57 

8 

8 

2 * 

73 

73 

9 

0 

3 2 

91 

91 

10 

10 

11 

133 

133 

12 

12 

13 

183 

183 

14 

14. 


Three dimensional space, PG(3, p n ) 


P n 

l 

m 

X 

M 

2 

15 

35 

7 

3. 

From the Euclidean geometry EG(k, p n ) for 

k > 1 other tactical configurations 

can be constructed. 

These are formed from the PG{k, p n ) by omitting a given 

line from the two dimensional space and a 

plane from the three dimensional 

space configurations. 

Some of the resulting designs are: 



Two dimensional space, EG(2, p") 


pn 

l 

m 

X 

M 

2 

4 

6 

3 

2 

3 

9 

12 

4 

3 

2 * 

16 

20 

5 

4 

5 

25 

30 

6 

5 

7 

49 

56 

8 

7 

2 s 

64 

72 

9 

8 

3 s 

81 

90 

10 

9 

11 

121 

132 

12 

11 

13 

169 

182 

14 

13. 



INCOMPLETE BLOCK CONFIGURATIONS 


81 

Methods are available for constructing the two dimensional space PO(k, p n ) 
and the corresponding EO(k, p n ) configurations where p is a prime number. 
This being true, we can also construct the completely orthogonalized squares 
from the EG(k, p") geometry. The reverse situation in which these configura¬ 
tions are constructed by using the completely orthogonalized squares is to be 
illustrated. These squares consist of superimposed Latin squares, fulfilling the 
condition that each number from the second Latin square occurs once and only 
once with each number in the first Latin square. As an example take the two 
Latin squares: 

Latin Square I Latin Square II 

12 3 13 2 

2 3 1 2 13 

3 12, 3 2 1. 

Superimpose square II upon square I to get the completely orthogonalized 
3x3 square, 


11 

23 

32 

22 

31 

13 

33 

12 

21 . 


The first number in each cell is a value from square I; the second number in each 
cell is from square II. Note that the numbers in the second place in each cell 
occur once and only once with each of the first numbers, that is 1-1,1-3, and 1-2. 
The completely orthogonalized squares have been proven to exist for all prime 
numbers and for powers of prime numbers. The solution of this problem was 
secured independently by Bose [2] and by Stevens [18]. Those of sides 2,2 s , 2*, 
2\ 2", 2 6 ,3, 3 s , 3 s , 3 4 , 5, 5 2 , 5 3 , 7,7 2 ,11 and 13 have been given. 

The completely orthogonalized 3x3 square may be used to construct 


11 


23 

4 

32 

7 

22 

2 

31 

5 

13 

8 

33 

8 

12 

6 

21 

9 


a balanced incomplete block design. The italic numbers, which follow the 
cell numbers, designate the 9 elements which are to be arranged in four groups of 
three sets. Group I is formed by placing the elements from each row into sepa¬ 
rate sets, in group II the elements from the three columns are placed in three 
sets; in group III the first set (7) consists of the elements which follow 1 in the 
first place in the cells, set (8) consists of the elements which follow 2 in the first 
place in the cells; and group IV is assembled in the same way as group III except 
the numbers in the second place in the cells are used to select the elements for 
each set. Thus we have the configuration: 



82 


bERTRUDE ii. COX! 


Group I 
Set (rove) 

( 1 ) 1 4 7 

(2) 2 5 8 

(3) 3 6 9 


Group II 
(columns) 

(4) 1 2 3 

(5) 4 5 6 

( 6 ) 7 8 9 


Group III 
(first place) 

(7) 1 6 8 

( 8 ) 2 4 9 

(9) 3 5 7 


Group IV 
(second pl&oe) 

(10) 1 5 9 

(11) 2 6 7 

(12) 3 4 8 


In the 12 sets of 3 elements, each of the 9 elements occurs with every other 
element once and only once in a set. 

This is an illustration of one series of configurations which can be constructed 
with the aid of the completely orthogonalized squares. These are the EG(k, p") 
in two dimensional space when k = 2 and p" = 2, 3, 2*, 5, 7, 2‘, 3 2 , 11, 13, . . . 
The PG(k, p n ) configurations can be written by adding (k + 1 ) elements 
to the previous group of configurations. For example, the elements 10, 11,12 
and 13 may be added to the groups, one to each group. That is, 10 is added to 
each set in group I, 11 is added to each set in group II, 12 to group III and 13 to 
group IV. An additional set must be added to include these four new elements. 
A configuration for t — 13, b = 13, k = 4, r = 4 and X = 1 results. 


Set 

(i) 

1 4 7 10 

(4) 

1 2 3 11 

(7) 

1 6 8 

12 

( 10 ) 

1 

5 

9 

13 

( 2 ) 

2 5 8 10 

(5) 

4 5 6 11 

( 8 ) 

2 4 9 

12 

( 11 ) 

2 

C 

7 

13 

(3) 

3 6 9 10 

( 6 ) 

7 8 9 11 

(9) 

3 5 7 

12 

( 12 ) 

3 

4 

8 

13 








03) 

10 

11 

12 

13. 


The 13 sets are made up of 4 elements each. These designs are symmetrical 
for sets and elements, that is, every pair of elements occurs together in the same 
number of sets, also, every pair of sets has the same number of elements in 
common. Discussion of the construction of these designs with illustrations are 
given in references [20, 8 , 9] and [19]. 

In the PG(k, p") series of designs, as constructed by means of completely 
orthogonalized squares, the sets cannot be arranged in replication groups. How¬ 
ever, these configurations can be arranged in Youden squares [22] in which all 
the sets are placed side by side and all the elements in a single row form a com¬ 
plete replication. This method of arrangement has been of considerable value 
in experimentation with plants. The Youden squares are the PG(k, p") when 
k = 2 . Singer [17] gives a partial list of the (reduced) perfect difference sets 
(table IV), only a single set for each p n . The number of distinct perfect differ¬ 
ence sets (or the number of distinct perfect partitions) for a given p" is equal to 
<p{q)/Zn. Since each perfect difference set can be paired with its inverse, the 
number is even. 

The construction of one of the Youden squares from its perfect difference set 
will be illustrated. Consider p n = 3 then <7 = p s " + p B + 1 = 3 2 + 3 + 1 = 13. 
There are two perfect difference sets with their inverses for q — 13. One perfect 
difference set is 0, 1, 3, 9 which has the perfect partition 1, 2, 6 , 4 which will 
add in succession to each number from 1 to and including 13, and also 1, 2, 6 ,4 



INCOMPLETE BLOCK CONFIGURATIONS 


83 


add to 13. The elements of the perfect difference set are put in set (1) except 
that 13 replaces 0. Set (2) is secured by a one-step cyclic substitution, 1 for 
13,2 for 1,4 for 3 and 10 for 9. This process is continued until there are thirteen 
sets. If the substitution is applied to set (13), the elements in set (1) are secured. 

Set 


Replica- A 
tion B 

C 
D 


(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) 

13 1 2 3 4 5 6 7 8 9 10 11 12 

1 2 3 4 5 6 7 8 9 10 11 12 13 

3 4 5 6 7 8 9 10 11 12 13 1 2 

9 10 11 12 13 1 2 3 4 5 6 7 8. 


This is the Youden square for t — 13, b = 13, r = 4, k = 4, and A = 1. The 
elements in each row form a complete replication. 


TABLE IV 

Singer’s list of perfect difference sets 

< p(.Q) 


p n q 3 n Perfect difference set 


2 

7 

2 

0 

1 

3 












2* 

21 

2 

0 

1 

4 

14 

16 










2 3 

73 

8 

0 

1 

3 

7 

15 

31 

36 

54 

63 






2 4 

273 

12 

0 

1 

3 

7 

15 

31 

63 

90 

116 

127 

136 

181 

194 

204 

3 

13 

4 

0 

1 

3 

9 











3* 

91 

12 

0 

1 

3 

9 

27 

49 

56 

61 

77 

81 





5 

31 

10 

0 

1 

3 

8 

12 

18 









7 

57 

12 

0 

1 

3 

13 

32 

36 

43 

52 







11 

133 

36 

0 

1 

3 

12 

20 

34 

38 

81 

88 

94 

104 

109 



13 

183 

40 

0 

1 

3 

16 

23 

28 

42 

76 

82 

86 

119 

137 

154 

175 


t an q aa J)** p n 1 


A third series of configurations, called Lattice squares or quasi-Latin squares 
[21] can be constructed by using the completely orthogonalized squares. The 
groups of sets on page 78 are taken in pairs. For each pair a square is constructed 
having its rows formed by the sets of one group and its columns by the sets of 
another group. For example, square I below is made so that the sets of group I 
form the rows and the sets of group II form the columns. Square II is the 
combination of groups III and IV. 


Square I 


i 

4 

7 

2 

5 

qo 

3 

6 

9 


Square II 


1 

6 

00 

9 

2 

4 

5 

7 

3 










84 


GERTRUDE M. COX 


In this lattice square each pair of elements occurs together once only in either a 
row or a column of either one of the squares. Also, every element occurs with 
every other element once in one column and one row from each square. 

A device known as “complements” gives several configurations. From an 
arrangement having k ^ $f, a second one can be obtained for the same number 
of elements, in sets of t — k units. This is done by replacing each set by its 
complement, that is, by a set containing all the elements missing from the 
original set. An illustration follows: 


t 

= 7, 6 = 7 



t = 7 

, 6 = 

7 


r 

= 3, k = 3 



r = 4 

, k = 

4 



X = 1 



X 

= 2 



Set 



Set 





(1) 

1 2 

4 

(i) 

3 

5 

6 

7 

(2) 

2 3 

5 

(2) 

1 

4 

6 

7 

(3) 

3 4 

6 

(3) 

1 

2 

5 

7 

(4) 

4 5 

7 

(4) 

1 

2 

3 

6 

(5) 

5 6 

1 

(5) 

2 

3 

4 

7 

(6) 

6 7 

2 

(6) 

1 

3 

4 

5 

(7) 

7 1 

3, 

(7) 

2 

4 

5 

6, 

While the triple systems 

, quadruple systems, 

etc., 

which 

have 

be< 


sidered by some mathematicians, do furnish designs meeting the balance re¬ 
quirements, they are usually not suitable for experimental purposes. A quad¬ 
ruple system requires that every possible triple of elements occur once and only 
once together in a block. Since we need only every pair together once (X = 1) 
or more, only the triple systems are generally useful. 

4. Summary. The mathematical theory of configuration has been helpful 
in the construction of the balanced incomplete block designs. It would be use¬ 
ful to know (a) what configurations (within the useful range) exist, (b) how these 
configurations may be constructed. In table I the configurations have been 
classified according to the value of X, while in table II configurations within a 
useful range have been listed. Of the designs in this table which have not been 
constructed, some are known to exist. Those aids which have been used in the 
construction of the balanced incomplete block designs have been briefly dis¬ 
cussed. 

REFERENCES 

[1] Ball, W. W. R. Revised by Coxeter, H. S. M. Mathematical recreations and essays, 

Macmillan and Co., London, 11th Edition, 1939. 

[2] Bose, R. C. “On the Application of the Properties of Galois Fields to the Problem 

of the Construction of Hyper-Graeco Latin Squares,” Sankhyd, Vol. 3(1938), 

pp. 323—338. 

[3] Bose, R. C. and Nair,K.R. “Partially Balanced Incomplete Block Designs, ’'Sankhyd, 

Vol. 4(1939), pp. 337-372. 



INCOMPLETE BLOCK CONFIGURATIONS 


86 


[4] Boas, R. C. "On the Construction of Balanced Incomplete Block Designs/’ Ann . 

of Eugenics , Vol. 9 Part 4 (in the press). 

[51 Carmichael, R. D. Introduction to the theory of groups of finite order , Ginn and Com¬ 
pany, 1937. 

[5] Cole, F. N. "Kirkman Parades,” Bull . of the Am . Math . Society , Vol. 28(1922), 

pp. 435-437. 

[7] Eckenstein, 0. "Bibliography of Kirkman’s School-girl Problem,” Messenger of 

Mathematics , Vol. 41(1911), pp. 33-36. 

[8] Fisher, R. A. and Yates, F. Statistical tables for biological agricultural , and medical 

research , Oliver and Boyd, Edinburgh, 1938. 

[9] Goulden, C. H. "Modern Methods for Testing a Large Number of Varieties,” Do¬ 

minion of Canada , Department of Agriculture, Tech. Bull . 9, 1937. 

[10] Goulden, C. H. Methods of statistical analysis, John Wiley and Sons, New York, 

1939. 

[11] Kirkman, T. P. "On a Problem in Combinations,” Cambridge and Dublin Math . 

Jour., Vol. 2(1847), pp. 191-204. 

[12] Kirkman, T. P. "Query,” Lady's and Gentleman's diary, p. 48,1850. 

[13] Moore, E. H. "Concerning Triple Systems,” Math. Ann., Vol. 43(1893), pp. 271-285. 

[14] Mulder, P. Kirkman-Systemen (Groningen Dissertation), Leiden, 1917. 

[15] Netto, Eugen. Lehrbuch der combinatorik. Verlag und Druck von B. G. Teubner, 

Leipzig und Berlin, 1927. Zweite Auflage. 

[16] Reiss, M. "Ueber eine Stcinersche combinatorische Aufgabe, welche im 45* t ® M Bande 

dieses Journals, Seite 181, gestellt worden ist,” Crelles Journal , Vol. 56(1859), 
pp. 326-344. 

[17] Singer, James. "A Theorem in Finite Projective Geometry and some Applications 

to Number Theory,” Trans. Am. Math. Society, Vol. 43(1938), pp. 377-385. 

[18] Stevens, W. L. "The Completely Orthogonalized Latin Square,” Ann. of Eugenics , 

Vol. 9(1939), pp. 82-93. 

[19] Weiss, Martin G. and Cox, Gertrude M. "Balanced Incomplete Block and Lattice 

Square Designs for Testing Yield Differences Among Large Numbers of Soy¬ 
bean Varieties,” Iowa Agr. Exp. Sta. Res. Bull., 257(1939), pp. 293-316. 

[20] Yates, F. "Incomplete Randomized Blocks,” Ann. of Eugenics , Vol. 7(1936), pp. 

121-140. 

[21] Yates, F. "A Further Note on the Arrangement of Variety Trials: quasi-Latin 

Squares,” Ann. of Eugenics, Vol. 7(1937), pp. 319 -335. 

[22] Youden, W. J. "Use of Incomplete Block Replications in Estimating Tobacco- 

mosaic Virus,” Contributions from Boyce Thompson Inst., Vol. 9(1937), pp. 41-48. 


Iowa State College, 
Ames, Iowa. 



A COMPARISON OF ALTERNATIVE TESTS OF SIGNIFICANCE FOR 
THE PROBLEM OF m RANKINGS 1 

By Milton Friedman 

A paper published in 1937 [2] suggested that the consilience of a number of 
sets of ranks can be tested by computing a statistic designated x? • A mathe¬ 
matical proof by S. S. Wilks demonstrated that the distribution of xl approaches 
the ordinary x 2 distribution as the number of sets of ranks increases. The 
rapidity with which this limiting distribution is approached was investigated by 
obtaining the exact distributions of x 2 for a number of special cases. It was 
concluded that “when the number of sets of ranks is moderately large (say 
greater than 5 for four or more ranks) the significance of xl can be tested by 
reference to the available x 2 tables” [2, p. 695]. The use of the normal distribu¬ 
tion was recommended when the number of ranks in each set is large, but the 
number of sets of ranks is small, although no rigorous justification of this pro¬ 
cedure was presented. 

Except for the few special cases for which exact distributions were given, the 
paper did not provide a test of significance for data involving less than six sets of 
ranks and a small or moderate number of ranks in each set. This important 
gap has now been filled by M. G. Kendall and B. Babington Smith [1], In 
addition, they furnish a somewhat more exact test of significance for tables of 
ranks for which the earlier article recommended the use of the x 2 distribution. 

Kendall and Smith use a different statistic, W, defined as xl divided by its 
maximum value, m(n — 1), where n is the number of items ranked, and m the 
number of sets of rank*. 2 The new statistic (independently suggested by W. 
Allen Wallis [3] who terms it the rank correlation ratio and denotes it by is 
thus not fundamentally different from xl • A more radical innovation is the 
improvement in the test of significance that they suggest. Instead of testing 
xf by reference to the x 2 distribution for n — 1 degrees of freedom, Kendall and 
Smith, generalizing from the first four moments of W, recommend that the 
significance of W be tested by reference to the analysis of variance distribution 

(Fisher’s z-distribution) with z = ^log e (-~-= (n — 1) — - , n* = 

l \ 1 — W ) m 

(m — l)£(n — 1) ~ For small values of m and n, they introduce con- 

1 The author is indebted to Mr. W. Allen Wallis for valuable criticism and to Miss Edna 
R. Ehrenberg for computational assistance. 

* This is Kendall and Smith's notation which will be used in the present paper. The 
original paper [2] designated the number of items ranked by p , and the number of sets of 
ranks by n. 


86 



TESTS OF SIGNIFICANCE FOR PROBLEM OF m RANKINGS 


87 


tinuity corrections, substituting for W 


_125_ 

m 2 (n® — n) ’ 


the statistic 


_ 5 — 1 __ m 2 (n a — n) 

w 2 (n» - n ) „ 24 

12 " r ‘ r m 2 (n 8 - n) 


where 5 is the observed sum of squares of the deviations of sums of ranks from 
the mean value, m(n + l)/2. Comparison with exact distributions of W (or 5) 
for special cases indicates that this test yields very good approximations to the 
correct probabilities. 

In the limit the two tests of significance are identical. Neglecting the 

correction for continuity, z — ^ log, ( —- ^ Xr . ) —> ^ log, ( ■ — , n* = 

2 \m(n — 1) - Xr/ 2 \n — 1/ 

(m — 1) (n — 1) — - —* *, and ni = (n — 1) — - —> (n — 1) as m —* «>. For 
|_ mj m 

n» — oo, the analysis of variance distribution is identical with the distribution 
2 

of \ log, —. The difference between the two tests is thus that one, x*> uses 

Vl\ 

a single (limiting) distribution for all values of m ) whereas the other, z, adapts 
the distribution to the value of m. 

The necessity of taking into account the value of m, while it increases the 
flexibility of the distribution, makes the z test somewhat less convenient in 
practice than the x test. Additional computation is required to obtain the 
values of n i and n 2 , and to make the continuity corrections. It Ls also fairly 
laborious to test the significance of the result, if exact values of z at any level of 
significance are required. In these instances, two-way interpolation of recip¬ 
rocals in the analysis of variance tables is necessary since both n x and are 
always fractional. These difficulties make it desirable to investigate the rapidity 
with which the significance levels given by the z test approach those given by the 
X 2 test, and thus determine the range of values of m and n for which the simpler 
test can safely be employed. This investigation will yield as a by product the 
.05 and .01 significance values of x 2 (or W or S) for selected values of m and n as 
determined by the z test. 

Table I presents a summary comparison of the values of x 2 at the .05 and .01 
levels of significance as shown by (1) exact distributions, (2) the z test with 
continuity corrections, (3) the x test. 8 The significance values are expressed in 
terms of Xr rather than W because, for a given number of ranks per set (i.e., a 
given n), the significance values given by the x test are the same regardless of the 
number of sets of ranks (i.e., of the value of m). This would not be so if W 
were employed, since W = x*/ m ( n ~ !)• The expected value of W depends on 


8 The values of xl computed using the z test that are given in Tables I and II were ob¬ 
tained with the afd of Fisher and Yates' Table V [4]. Linear interpolation of reciprocals 
was employed throughout. 



88 


MILTON FRIEDMAN 


in and approaches zero asm-* « while the expected value of x* is equal to » — 1 
for all values of m. 

The values given by the z test agree remarkably well with the exact values. 
With but two exceptions (the .01 values for n = 3, m = 8 and 10) the exact 
value differs very much less from the value given by the z test than from the 
value given by the x* test. In all but three of the 12 comparisons, the z test 
gives a value below the correct one. 4 

TABLE I 


Comparison of Values of x? at .06 and .01 Levels of Significance Yielded by Exact 
Distributions, z Test with Continuity Corrections, and x* Test 




.05 Level of Significance 

.01 Level of Significance 



From Exact 

From 


From Exact 

From 




Distribution 

z test 

1 

Distribution 

z test 


n 

m 



with 

conti¬ 

nuity 

correc- 




with 

conti¬ 

nuity 

correc- 


Limits 

In¬ 

terpo¬ 

lated 

From 
X* test 

Limits 

In¬ 

terpo¬ 

lated 

From 
X* test 




value* 

tions 



value* 

tions 


3 

8 

5.25 6.25 

6.16 

6.012 

5.991 


9.00 

8.35 

9.21 


9 

6.0 -6.22 

6.17 

6.004 

5.991 


8.67 

8.44 

9.21 


10 

5.6 -6.2 

6.08 

5.999 

5.991 

8.6 - 9.6 

9.04 

8.51 

9.21 


00 



5.991 

5.991 



9.21 

9.21 

4 

4 

7.5 -7.8 

7.54 

7.43 

7.82 

9.3 - 9.6 

9.42 

9.21 

11.34 


5 

7.32-7.8 

7.54 

7.52 

7.82 

9.72 9.96 

9.87 

9.66 

11.34 


6 

7.4 -7.6 

7.49 

7.57 

7.82 


10.00 

9.95 

11.34 


00 



7.82 

7.82 



11.34 

11.34 

5 

3 

8.27-8.53 

8.41 

8.59 

9.49 

9.87-10.13 

10.05 

10.08 

13.28 


00 



9.49 

9.49 



13.28 

13.28 


* Computed by linear interpolation of probabilities. 


Table II gives for a very much larger number of values of m and n the .06 
and .01 values of xl computed on the basis of the z test with continuity correc- 

4 These comparisons duplicate some of those made by Kendall and Smith and merely 
serve to confirm their conclusion that the i test with continuity corrections gives exceed¬ 
ingly good results. 

The values obtained using the z test without continuity corrections agree less well with 
the exact values than those obtained with the aid of the continuity corrections. However 
even if no continuity corrections are made the z test in general yields values closer to the 
exact values than does the x* test. 




TESTS or SIGNIFICANCE FOB PROBLEM OF m RANKINGS 


TABLE II 


Values of x* at .06 and .01 Levels of Significance Computed on the Basis of Kendall 
and Smith's z test, with Continuity Corrections; .10, .076, .OS, .016 Values of x* 


m 

n 

3 

4 

5 6 | 

! 7 

Values at .05 Level of Significance 


3 



8.59 

9.90 

11.24 

4 


7.43 

8.84 

10.24 

11.62 

5 


7.52 

8.98 

10.42 

11.84 

6 


7.57 

9.08 

10.54 

11.97 

8 

6.012 

7.63 

9.18 

10.68 

12.14 

10 

5.999 

7.67 

9.25 

10.76 

12.23 

15 

5.985 

, 7.72 

9.33 

10.87 

12.36 

20 

5.983 

7.74 

9.37 

10.92 

12.42 

100 

5.987 

7.80 

9.46 

11.04 

12.56 

SO 

5.991 

7.82 

9.49 

11.07 

12.59 

X s (-10) 

4.605 

6.25 

7.78 

9.24 

10.64 

X s (.075)* 

5.18 

6.90 

8.49 

10.00 

11.45 


Values at .01 Level of Significance 


3 



10.08 

11.69 

13.26 

4 


9.21 

10.93 

12.59 

14.19 

5 


9.66 

11.42 

13.11 

14.74 

6 


9.95 

11.74 

13.45 

15.09 

8 

8.35 

10.31 

12.13 

13.87 

15.53 

10 

8.51 

10.52 

12.37 

14.11 

15.79 

15 

8.74 

10.79 

12.67 

14.44 

16.14 

20 

8.85 

10.93 

12.82 

14.60 

16.31 

100 

9.14 

11.26 

13.19 

14.99 

16.71 

00 

9.21 

11.34 

13.28 

15.09 

16.81 

x* (. 02 ) 

7.82 

9.84 

11.67 

13.39 

15.03 

x ! (.015)* 

8.40 

10.46 

12.34 

14.09 

15.77 


* Computed from Fiaher and Yates’ Table IV (4) by linear interpolation between the 
logarithms of the probabilities. 










90 


MILTON FBHBDMAN 


tions. The values entered for m = oo are obtained from x* tables for n — 1 
degrees of freedom and are the significance values by the x test for all values of 
m. It is apparent that as m increases the .01 and .05 values of x* approach their 
limiting values very rapidly. For n = 7, two-thirds of the difference between 
the .05 values for m = 3 and m = », and an even larger proportion of the 
difference between the .01 values, disappears by the time m = 10; and the 
situation is similar for the other values of n. Except for the .05 values for n 3, 
the approach to the limit is monotonic from below. The use of the x 2 test thus 
tends to lead to the overestimation of the significance values and of the probabili¬ 
ties attached to observed values of xl • It is clear, however, that for large and 
even moderate values of m the x 2 test is, for all practical purposes, equivalent 
to the z test. 

In order to determine more precisely the range of values of m and n for which 
the approximation given by the x 2 test is adequate, it is necessary to adopt some 
convention about the error in estimated significance values of xl that is tolerable. 
Since the conclusion drawn from an observed x* depends on the probability 
that it will be exceeded by chance, this convention clearly should be expressed in 
terms of the error in the probability. 

The structure of published x 2 tables makes it convenient to accept an estimated 
probability between .10 and .05 as a tolerable approximation to a correct prob¬ 
ability of .05, and an estimated probability between .02 and .01 as a tolerable 
approximation to a correct probability of .01. These ranges of tolerance are 
entirely on one side of the correct probability because, as pointed out above, the 
error in using the x 2 test is consistent in direction. These ranges are purely 
arbitrary, of course, and many may think them too broad. 

On the basis of this or some similar convention it is possible to make objective 
statements concerning the range of values of m and n for which the x 2 test is 
adequate. The next to the last line in the first section of Table II gives the .10 
values of x 2 ; the next to the last line in the second section, the .02 values. All 
the .05 values of xl shown in the table exceed the .10 value of x 2 - Using the x 2 
test, all of the values (with two exceptions for n = 3) would signify a probability 
greater than ,05 but less than .10. Thus the error made at the .05 level is 
within the admissible range according to the suggested convention. The x 2 
test is therefore an adequate substitute for the z test at the .05 level for all 
values of m and n except possibly for a few of the values for which exact dis¬ 
tributions are available. 

As might be expected, the x 2 test is less satisfactory at the .01 level. For 
values of m less than six, the .01 values of x* computed using the z test with 
continuity corrections are less than the .02 value of x 2 - For m greater than 5, 
the values of x 2 in the table would all be accorded a probability greater than .01 
but less than .02 if the x 2 test were employed. As already noted, this is the range 
of values of m for which the original paper suggested the x 2 test could validly be 
used [2, p. 695]. 

In view of the arbitrary nature of the convention as to the permissible error 



TESTS OF SIGNIFICANCE FOB PROBLEM OF ttl BANKINGS 


91 


in the probability attached to an observed value of x *, it is interesting to in¬ 
vestigate the effect of an alternative and stricter convention, namely, that only 
probabilities from .075 to .05 and from .015 to .01 be accepted as approximations 
to correct probabilities of .05 and .01 respectively. The .075 and .015 values of 
x* are given in the last lines of the two sections of Table II. On the basis of this 
convention the x* test is adequate at the .05 level for m greater than three, and 

TABLE III 


Values of S at .06 and .01 Levels of Significance Computed on the Basis of Kendall 
and Smith’s i test, with Continuity Corrections 


m 

n 

Additional values for 
n a* 3 


3 

4 

5 

6 

7 

m 

s 

Values at .05 Level of Significance 

3 



64.4 

103.9 

157.3 

9 

54.0 

4 


49.5 

88.4 

143.3 

217.0 

12 

71.9 

5 


62.6 

112.3 

182.4 

276.2 

14 

83.8 

6 


75.7 

136.1 

221.4 

335.2 

16 

95.8 

8 

48.1 

101.7 

183.7 

299.0 

453.1 

18 

107.7 

10 

60.0 

127.8 

231.2 

376.7 

571.0 



15 

89.8 

192.9 

349.8 

570.5 

864.9 



20 

119.7 

258.0 

468.5 

764.4 

1158.7 




Values at .01 Level of Significance 


3 



75.6 

122.8 

185.6 

9 

75.9 

4 


61.4 

109.3 

176.2 

265.0 

12 

103.5 

5 


80.5 

142.8 

229.4 

343.8 

14 

121.9 

6 


99.5 

176.1 

282.4 

422.6 

16 

140.2 

8 

66.8 

137.4 

242.7 

388.3 

579.9 

18 

158.6 

10 

85.1 

175.3 

309.1 

494.0 

737.0 



15 

131.0 

269.8 

475.2 

758.2 

1129.5 



20 

177.0 

364.2 

641.2 

1022.2 

1521.9 




at the .01 level for m greater than nine, except possibly for a few of the values 
for which exact distributions are available. Thus even so drastic a lowering of 
the permissible margin of error as halving it limits only slightly the range of 
values of m for which the x* test is adequate. 

Table II provides, of course, a direct means of testing the significance of 
observed values of x? lor the tabled values of m and n. For this purpose, how¬ 
ever, Table III, giving the significance values of S is more useful, since it obviates 





92 


MILTON FRIEDMAN 


the necessity of converting S into xJ • For n - 3 Table III includes a few 
values of m in addition to those in Table II. 

SUMMARY 

The preceding analysis suggests that the x 2 test of the significance of x* 
(or W or ijJ), while less accurate than the z test proposed by Kendall and Smith, 
is adequate for practical purposes at the .01 level of significance if the number of 
sets of ranks (m) is greater than 5; and at the .05 level for any number of sets of 
ranks, provided the number of ranks in each set (n) is more than 3. Exact 
distributions are now available for n = 3, m = 3 to 10; n = 4, m = 3 to 6; 
n = 5, m = 3 [1]. The .05 and .01 values of x? and S, computed using the 
Kendall and Smith z test with continuity corrections, are given in Tables II 
and III of the present note for n — 3 to 7 and selected values of m from 3 to 100. 
For » greater than 7 and m less than 6, the z test with continuity corrections 
should be employed. For all other combinations of n and m not covered by the 
exact distributions or by Tables II and III, the x test is adequate. 

REFERENCES 

[1] M. G. Kendall and B. Babington Smith, “The Problem of m Rankings,” Annals of 

Math. Stat., Vol. N (1939), pp. 275-87. 

[2] Milton Friedman, “The Use of Ranks to Avoid the Assumption of Normality Implicit 

in the Analysis of Variance,” Jour. Am. Stat. Assn., Vol. XXXII (1937), pp. 
675-701. 

[31 W. Allen Wallis, “The Correlation Ratio for Ranked Data,” Jour. Am. Slat. Assn., 
Vol. XXXIV (1939), pp. 533-8. 

[4] R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and Medical 
Research (London: Oliver and Boyd, 1938). 

National Bureau of Economic Research, 

New York. 



NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


NOTE ON AN APPROXIMATE FORMULA FOR THE SIGNIFICANCE 

LEVELS OF Z 

By W. G. Cochran 


1. Introduction. An important part has been played in modern statistical 

analysis by the distribution of z = \ log > when 4 and 4 are two independent 

#2 

estimates of the same variance. In particular, all tests of significance in the 
analysis of variance and in multiple regression problems are based on this 
distribution. Complete tabulation of the frequency distribution of z is a heavy 
task, because the distribution is a two-parameter one, the parameters being the 
number of degrees of freedom, n x and n s in the estimates si and 4 • Thus each 
significance level of z requires a separate two-way table. Fisher constructed a 
table of the 5 percent points in 1925 [1], and this has since been extended by 
several workers [2] to the 20,1, and 0.1 percent level for a somewhat wider range 
of values of n x and ns. 

With his original table, Fisher gave an approximate formula for the 5 percent 
values of z, for high values of n, and n 2 outside the limits of his table. The 
formula reads: 

(1) z (5 percent) = — 0.7843 (— —, 

y/h - 1 V*i «*/ 

, 2 1 , 1 
where 7 = —|—. 
h n 2 


The constant 1.6449 is the 5 percent significance level for a single tail of the nor¬ 
mal distribution, and the constant 0.7843 will be found to be i{2 + (1.6449)*|. 
Thus the general formula for the significance levels of z derivable from (1) is 


x 

vn 



where a; is a normal deviate with unit standard error. By inserting the appro¬ 
priate significance level of x, this formula has been extended [2] to the tables of 
the 20, 1, and 0.1 percent levels of z and commonly appears with all published 
tables of z. The objects of this note are to indicate the derivation of the 
formula and to suggest an improvement upon it in the latter cases. 



94 


W. G. COCHRAN 


2. The transformation of the ^-distribution to normality. For high values 
of Ui and n% , the distribution of z approaches the normal distribution, the 
principal deviation being a slight skewness introduced by the inequality of n\ 
and n 2 . It is therefore natural to seek an approximate formula for the distri¬ 
bution of z by examining its relation to the normal distribution. For the 
^-distribution the ratio k t /k% 2 , where tc r is the r u * cumulant, is of the order 
w j lere n j g j.j )e sma n er 0 f Ul an( j n2 4 This property is common to a large 
number of distributions which tend to normality; for example, the distribution 
of the mean of a sample of size n from any distribution with finite eumulants. 
Fisher and Cornish [3] have recently given a method, applicable to all distribu¬ 
tions with this property, for transforming the distribution to a normal distri¬ 
bution to any desired order of approximation. They also obtained explicit 
expressions for the significance levels of the original distribution in terms of the 
significance levels of the normal distribution, discussing the ^-distribution as a 
particular example. The relation between z and the normal deviate x at the 
same level of probability was found to be 


(2) z 


x 

y/h 


-i(x 2 + 2) 


A _ l\ f x* + 3 x 
\ni 712/ ' \^Jl \ 12 h 


, x + Ux u 

+ -144 « 


(1-IY}. 

\7h 712/ ) 


the three terms on the right hand side being respectively of order tT\ n " l , and 
n~*, so that terras of order n“ 2 are neglected. 1 

If this equation is compared with equation (1), the latter appears at first 
sight to be the ap proxim ation of order rT 1 to the z-distribution, except that the 
divisor of x is y/h — 1 in (1) and \/h in (2). Computation of a few values 
shows that at the 5 percent level, equation (1) is the better approximation. For 
example, for ni = 40, Tit = 60, (1) gives z (5 percent) = .2334, (2) gives .2309, 
and the exact value is .2332. 

Since 


x 

y/h — 1 


y/h 


+ 


2 hy/h 


+ terms of order n 2 , 


Fisher’s approximation differs from (2) by including a correction term of order 
Inspection of the true correction terms of this order in equation (2) shows 

x 8 + llx r- ( 1 1Y 

that for finite values of Wi and n* the term —— y/h (-) is consider- 

144 \ ni m/ 

x? 4 * 3x 

ably smaller than the term —-— y= , since the former has a smaller numerical 

I2hy/h 

coefficient and involves the difference between ~ and —. Thus Fisher’s 

Ui 712 

formula gives a close approximation to the true formula of order n'~ ! , provided 

x x? -I— 3x x* 2 | 3 

that ^ is approximately equal to —i i- e - if —g— is approximately equal 


1 Fisher and Cornish also gave the two succeeding terms. 



SIGNIFICANCE LEVELS OF Z 


95 


J I O 

to 1. For the 5 percent level, x = 1.6449, and —-— = 0.951. Thus at the 

_ o 

5 percent level the use of y/h — 1 in (1) instead of y/h extends the validity of 
Fisher’s approximation from order rT 1 to order ri~*. 

This ingenious device, however, requires adjustment at other levels of sig¬ 
nificance. The values of ( x 2 + 3)/6 at the principal significance levels are 
shown below. 


Significance level—% 

40 

30 

20 

10 

5 

1 

0.1 

X = (x* + 3)/6 

0.51 

0.55 

0.62 

0.77 

0.95 

1.40 

2.09 


If y/h — 1 in formula (1) is replaced by y/h — X, with the above values of X, 
Fisher’s formula will be approximately valid to order n” 1 at all levels of signifi¬ 
cance. In particular, for the tables already published of the 20, 1 and 0.1 
percent points, X may be taken as 0.6, 1.4 and 2.1 respectively. The values of z 
given by the use of y/h — 1 and y/h — X are compared below for n\ = 24, 
n* sb 60. 2 


Significance Level 

Approximate formula 

Exact value 

y/h- 1 

Vh-\ 

20 % 

.1346 

.1337 

.1338 

1 % 

.3723 

.3748 

.3746 

0 .1% 

.4875 

.4966 

.4955 


The use of y/h — X gives values practically correct to 4 decimal places* 
except for the 0.1 level of significance, at which the higher terms become more 
important. 

With the aid of this formula, complete tabulation of the ^-distribution for a 
given pair of high values of rti and is relatively simple. If very low proba¬ 
bilities at the tails are required, the further approximations given by Fisher and 
Cornish [3] may be used. 

REFERENCES 

[1] R. A. Fisher. Statistical Methods for Research Workers. Edinburgh, Oliver and Boyd. 

1st Ed. 1925. 

[2] R. A. Fibher and F. Yates. Statistical Tables. Edinburgh, Oliver and Boyd. 1938. 

[3] E. A. Cornish and R. A. Fisher. “Moments and Cumulants in the Specification of 

Distributions,” Revue de VInstitut International de Statistique f Vol. 4 (1937). 

Iowa State College, 

Ames, Iowa. 


* The numerical terms in the approximate formula given for the 20 percent points on p. 28 
of Fisher and Yates’ Statistical Tables are in error. Their formula should read; 


t 


0.8416 

VT^~i 






96 


ABRAHAM WALD 


A NOTE ON THE ANALYSIS OF VARIANCE WITH UNEQUAL CLASS 

FREQUENCIES 1 

By Abraham Wald* 

Let us consider p groups of variates and denote by nt; (j — 1, • • • , p) the 
number of elements in the j- th group. Let a:,-,- be the t-th element of the j-th 
group. We assume that a:,-,- is the sum of two variates e,-,- and if,-, i.e. a:,-,- = 
+ if,-, where e<,- (i = 1, • • • , m,-; j = 1, • • • , p) is normally distributed with 
mean n and variance <r 2 , and if, (j = 1, • • • , p) is normally distributed with 
mean \i' and variance a' 2 . All the variates e<, and ij,- are supposed to be dis¬ 
tributed independently. 

The intraclass correlation p is given by 8 

<r ' 2 

P a 2 + a' 2 ' 


Confidence limits for p have been derived only in case of equal class frequencies, 
i.e. mi — m-t — • • • «* m p . In this paper we shall deal with the problem of 
determining the confidence limits for p in the case of unequal class frequencies. 

a ' 1 

Since p is a monotonic function of —, our problem is solved if we derive confi- 

d ' 2 

dence limits for —. 

a 2 

Denote by £,- the arithmetic mean of the j-th group, i.e. 


( 1 ) 


Xi = 



rrij 


+ Vi- 


Hence the variance of x,- is equal to 

(2) cl, = ?~ + c'\ 

1 Vflj 

it ' 2 

Denote — by X 2 . Then we have 
<r 


(3) 



2 

<r 

Wj 


H'he author is indebted to Professor H. Hotelling for formulating the problem dealt 
with in this paper. 

* Research under a grant-in-aid from the Carnegie Corporation at New York. 

1 See for instance R. A. Fisher, Statistical Methods for Research Workers, 6-th edition, 

p. 228. 



ANALYSIS OF VARIANCE 


97 


where 

(4) 

Now we shall prove that 

(5) 


mi 

Wi “ 1 +m ; A 2 ' 


I I 


£ «vv 


1-1 


has the x 2 -distribution with p — 1 degrees of freedom. Let 

Vi - V«V £ i (J - i* • • • > P) 

and consider the orthogonal transformation 

, y„), 

y' n -i = L p -i(yi,,y p ), 

sr; = -I-+ 


Vu>l + ••• + 


V)„ 


where , • • • , y,,), ■ ■ ■ , L v ^{y x , • ■ • , y p ) denote arbitrary homogenous 
linear functions subject to the only condition that the transformation should 
be orthogonal. 

Since the mean value of y, is equal to sfwj (y + y) and the variance of p, 
is equal to <r 2 , we obviously have: The mean value of y\ (j = 1, • • • , p — 1) 
is equal to zero, the variance of y\ (j — 1, ■■■ , p) is equal to tr*. In order to 
prove our statement, we have only to show that the expression (5) is equal to 

3 {y'i + • • • + y'p- i). If we substitute in (5) ~J=. for Xj , we get 


iji. s-.jsM’.ff’'™ 




\ 2 w f 


i 


i -1 


(50 


. „ (EvW 

-3 Z*i-2-L_-: + -L_,- 

■r L >»f Lvi, 


(E V5Jv,)’ 
E»J--L 


Z«* 

(y’i + • • • + y'p- 1 ). 


1 p 

^ 2 /2 

i r< 

^ A /* /J 

* 2 L; 

L y* - yp 

-i j 

<r J U 

Lvi -y v 

-i 





98 


ABRAHAM WALD 


Since ———— has the x 2 distribution with N — p degrees of freedom, the 


expression 

( 6 ) 


p - 1 22(x<, - f,) s 


has the analysis of variance distribution with p — 1 and N — p degrees of 
freedom, where AT = + • • •+»»;>. In case wii = wj = • • • = m P — m, 

we have 


(60 

where x 
Hence 


F = 


N -pti 5)1 m 
p - 1 22(xi; - x,) 2 'l + m\- 


22x</ 

~N 


and F* = 


N — p m2(x,- 
p - 1 22(x# 


i) 2 

x,) 2 ' 


1 

1 + mX 2 


F*, 


X 2 = 



If Fi denotes the lower and F 2 the upper confidence limit of F, we obtain for X 2 
the confidence limits 




Let us now consider the general case that m ,, • • • , rn p are arbitrary positive 
integers. First we shall show that the set of values of X 2 , for which (6) lies 
between its confidence limits Fi and F 2 , is an interval. For this purpose we 
have only to show that 


/(X 2 ) - 



2u>,x,Y1 

210 , 7 / 


is monotonically decreasing with X*. In fact 


d/(X 2 ) 

dX* 

Since 


we have 



d/(xO 

"dX* 



2t»,xA 2 

2to7y/ 


< 0 , 


which proves our statement. 





ANALYSIS OF VARIANCE 


99 


Hence the lower confidence limit X* of X s is given by the root of the equa¬ 
tion in X*: 




(7) . 

V - 1 »(** - */) 2 

and the upper confidence limit X* of X* is given by the root of the equation in X 2 : 


( 8 ) F - Fi . 

Since /(X s ) is monotonically decreasing, the equations (7) and ( 8 ) have at 
most one root in X s . If the equation (7) or ( 8 ) has no root, the corresponding 
confidence limit has to be put equal to zero. If neither (7) nor ( 8 ) has a root, 
we have to reject at least one of the hypotheses: 

(1) xu — *1, 4" ij/. 

(2) The variates «„• and »j, (i = 1, • • • ,m, ;j = 1, • • • ,p) arc normally and 
independently distributed. 

(3) Each of the variates has the same distribution. 

(4) Each of the variates 17 , has the same distribution. 

The equations (7) and ( 8 ) are complicated algebraic equations in X s . For 
the actual calculation of the roots of these equations, well known approximation 
methods can be applied making use also of the fact that the left members are 
monotonic functions of X s . In applying any approximation method it is very 
useful to start with two limits of the root which do not lie far apart. We shall 
give here a method of finding such limits. 

Denote by F the function which we obtain from F (formula (6)) by substi¬ 
tuting 

i qj^, for w, , O' = 1, • • •, p). 


Let / be the function obtained from / by the same process. 

Denote by tp(m, X s ) the function which we obtain from P by substituting m 
for U , ■ • • , l v . We shall first show that P is non-decreasing with increasing 

dP 

h (k = 1, • • ■ , p), i.e. -y > 0. For this purpose we have only to show that 


^>0. We have: 
dlk 



Hence our statement is proved. Denote by m' the smallest and by m" the 
greatest of the values mi, • • • , m p . Then we obviously have 



100 


WILLIAM G. MADOW 


(9) X 2 ) < F < ip(m", X 2 ). 

Denote by Xj 2 , x'/ 2 , X* 2 , X 2 2 the roots in X 2 of the following equations respectively: 
<p(m', X 2 ) = Ft ; 

*(m", X 2 ) = Ft ; 

<p(m\ X 2 ) = F t ; *(m", X s ) = F,. 

Since F is monotonically decreasing with increasing X 2 , on account of (7), (8), 
and (9) we obviously have 

X? < X? < xi' 2 

and 

X* 2 < X* < xi' 2 . 

The above inequalities give us the required limits. 

Columbia University, 

New York, N. Y. 


THE DISTRIBUTION OF QUADRATIC FORMS IN NON-CENTRAL 
NORMAL RANDOM VARIABLES 

By William G. Madow 1 

The following theorem is the algebraic basis of the theorem of R. A. Fisher 
and W. G. Cochran which states necessary and sufficient conditions that a set 
of quadratic forms in normally and independently distributed random variables 
should themselves be independently distributed in x 2 -distributions. 2 

Theorem I. If the real quadratic forms qi , • • • , q m , in x 1 ,•••,£» , are 
such that 

(1) Z = Z xl, 

7 v 

and if the rank of q y is n y , then a necessary and sufficient condition that 

(2) q y - Z , 

1 The letters t, j, /*, v will assume all integral values from 1 through n, the letter y will 
assume all integral values from 1 through m, (n ^ m) t the letter a will assume all integral 
values from ni + ••• + n 7 -i + 1 through ni + • • • + n y , (n 0 * 0, ni + •• • + n» « n'), 
the letters 0, 0' will assume all integral values from 1 through n', and the letters r, 8 will 
assume all integral values from 1 through n — 1. 

8 The references are: W. G. Cochran, “The Distribution of Quadratic Forms in a Normal 
System, with Applications to the Analysis of Covariance,” Proc. Camb. Phil. Soc. f Vol. 
30 (1934), pp. 178-191, and R. A. Fisher, “Applications of ‘Student's* Distribution,” 
Metron , Vol. 5 (1928), pp. 90-104. 



DISTRIBUTION OF QUADRATIC FORMS 


101 


where the real linear functions z# of the x, are defined by 

(3) x, * 2 CppZfi 

& 

is 

(4) n' — n. 

Furthermore the system of linear forms (3) constitute an orthogonal transformation. 

Proof: Necessity. Since the rank of a sum of quadratic forms is less than 
or equal to the sum of their ranks, it follows that n' > n. Upon substituting 
from (3) for the x’s in (1), and using (2), it is seen that, for all values of the z’s, 

Z4 - £<? CpfiCpfi*) ZfiZfi* 

and hence, from (1), it follows that 

(5) 23 CpfiCppt = fa’ 

p 

where = 0, if 0 0', and S#* * 1 if 0 =* 0'. However, since the rank 

of the system of linear forms (3) is not greater than n, and since the matrix 
of (5) is the product of the matrix of (3) by its transposed matrix, it follows 
that (5) can be true only if n' is not greater than n. Consequently n' = n. 
It then is an immediate result of (5) that the transformation (3) is orthogonal. 

Sufficiency. We assume that n'. = n. By a real linear transformation of 
X \, • • • , x n we obtain linear forms z, such that 

Qy ~ 53 c <* z a J 

a 

where c„ = 1 or — 1. The set of linear functions z t , • • • , z„ are linearly inde¬ 
pendent, for if z n ^ 0, and if real numbers hi , • • • , /i*_i not all zero, exist such 
that, say, 

Zn - m Kz, 

r 

then 

Z) - 2 H.ZtZ. . 

p r,« 

Substituting, we have 

£ Qy = 2 c,zl = 2 Z H r ,c ri ‘c"x ll x, 

y 9 r,» n,9 

where z, = 22 c” , ‘^ . (It is not assumed here that the matrix of the d" is the 
* 

inverse of the matrix of the c„, . That fact is a consequence of this proof.) 
Denoting the matrix of Z\, ■ ■ ■ , z n -i by C n we see that the matrix of 2 9r is 

i 

CnHCn where H is the matrix of the H r . and has rank less than or equal to n — 1 
which contradicts the hypothesis. Hence if C is the matrix having the elements 



102 


WILLIAM G. MADOW 


c, in its main diagonal and zeros elsewhere and if C„ is the matrix of zi, 
it follows that 


C'nCCn = I, 


*n 


where I is the identity matrix, i.e. the matrix having ones in the main diagonal 
and zeros elsewhere and C„ non-singular. Then C = C^'C* 1 and hence C is 
the identity matrix and C n is orthogonal. 

Among the hypotheses of the Fisher-Cochran theorem is the hypothesis that 
the mean value of a; M is 0, and the variance of x„ is a 2 . However, in connection 
with his analysis of the distribution of the multiple correlation coefficient,* 
R. A. Fisher derived the distribution of the sum of the squares of n independently 
distributed random variables xi , • • • ,x„, the probability density of x„ being 
given by 

( 6 ) p(x M ) = (27ror 2 ) _i ex P [“ 2^5 ^ ~ • 

More recently, P. C. Tang , 4 has used the distribution of the sum of non-central 
squares in his study of the power function of the analysis of variance test. 

In this note we extend the Fisher-Cochran theorem to non-central random 
variables. If the random variables are independently distributed with 
probability densities given by ( 6 ), Fisher and Tang have shown that if x' 1 = 

52 x*, then the probability density of x ' 2 is given by 

<r , 


(7) 


P(x' 2 ) 




y (frxjV 
Po v! I '(in + v) 


where X = i Z) • 
l<r , 


We now give necessary and sufficient conditions that a set of quadratic forms 
in normally and independently distributed random variables should themselves 
be independently distributed in x' 2 -dlstributions. 

Theorem II. Let x i , • • • , x n be independently distributed random variables, 
the random variable x h having probability density (6). Denote X) x\ by q, and 


denote J- t ^al by X. 

Aft* ¥ 


Let qi f • " , q m y be quadratic forms, 


qy ** li] Qfw XpXp 


such that q y = q, and let the rank of q y be denoted by n y . 


1 R. A. Fisher, “The General Sampling Distribution of the Multiple Correlation Coeffi¬ 
cient,” Proc . Royal Soc . of London , (A), Vol. 121 (1928), pp. 664-673. 

4 P. C. Tang, “The Power Function of the Analysis of Variance Tests with Tables and 
Illustrations of their Use,” Statistical Research Metnoirs } Vol. 2 (1938), pp. 126-149. 



DISTRIBUTION OF QUADRATIC FORMS 


103 


A necessary and sufficient condition that the quadratic forms x'y, 
be independently distributed with joint probability density 

(8) P (x: j , «np(x;‘), 

y 

where p(x'y) ^ given by (7) with n, and \y in place of n and X, and 

(9) X T = — 

is n' — n. 

Proof. Necessity. Tang 6 has shown that the distribution of x ,J is given 
by (7) and that if the x'y have joint distribution (8), then the distribution of 
Xi 2 + • • • + Xm; (= x' 2 ), is (7) with n' in place of n. Upon comparing terms, 
we see that «' — n. 

Sufficiency. By Theorem I there exist n orthogonal linear functions (3) such 
that (2) is true. Then it is easy to see that the random variables z x , •. • , 2 , 
are independently distributed with a joint probability density 

(10) p(*i, • • •, *0 = (2™ 2 r‘" exp [-*£(*,- o.?], 

P 

where 

cEi> “ a ?, and a^ ~ c,, o,. 

P P P 

If we set 2<r\ = 22 a '«, then we have, from (7) and (10), that the x'y* are 

a 

independently distributed with joint probability density (8). It is only neces¬ 
sary to show that Za’J - £«“<., a. in order to complete the proof of the 

a n ,p 

theorem. Now 

&tiP Ciffijp} 0>i&j • 

n,p *’»; n,p 

On the other hand, by direct substitution for the z } s wc see that 

Q.y 35 X 2 a 32 (23 C/na^a) %p%p 

« H,p a 

and hence alV = 22 c^c,a ■ Since (1) is an orthogonal transformation, 

a 

23 Upp^ CipCjp * 23 (2!) fycfip*) C^C/F = 23 bai&aj t 

li, p h,p a a 

where * 0, if a 5 * t and = 1 if a = t, which completes the proof. 

It is emphasized that the form of X 7 makes it unnecessary to calculate the 
matrix of q y to determine \ y since the values a 9 need only be substituted for the 
x, in the original expression for q y to determine \ y . 

Washington, D. C. 



1 See 4 p. 140. 


104 


LOUIS OLSHEVSKY 


TWO PROPERTIES OF SUFFICIENT STATISTICS 


By Louis Olshevsky 

The concept of sufficient statistics was introduced by R. A. Fisher in 1922. 
It was refined and extended in 1936 by Neyman and Pearson who gave defini¬ 
tions of shared sufficient statistics and sufficient sets of algebraically independent 
statistics . 1 Today the concept plays an important part in the theory of the 
subject. Characterized briefly, a statistic associated with a single or specific 
population parameter is sufficient when no other statistic calculated from the 
same sample sheds any additional light on the value of the parameter. We 
shall prove that sets of sufficient statistics possess certain interconnections so 
that when one set is known every other set with a like number of members and 
linked with the same population parameters is discoverable. 

Theorem 1. If T \ , • • • , T m are a set of m (m g n) algebraically independent 
sufficient statistics with regard to the parameters 0i, • • • , 6 q and the probability law 
p(x i, * • • , x n | 0i, • • • , 0« , • • • , 0i), a necessary and sufficient condition for the 
sufficiency of any set of m algebraically independent statistics Ti , • • • , T m with 
regard to the same parameters and the same, probability distribution is that the Ti 
be a set of independent functions of the T, (i, j = 1, • • • , m). 

Proof: As an adjunct in the demonstration we cite the following theorem 
due to Neyman . 2 For a set of algebraically independent statistics 1\ , • • • , T m 
to be a sufficient set with regard to the parameters d\ , • • • , 6 g , it is necessary 
and sufficient that in any point of sample space, except perhaps for a set of 
measure zero, it should be possible to present the probability law in the form 
of the product 

(1) f * * f \ Oi f • • • , 0<j , * ■ • , 0/) 

“ p(^ l 7 * ‘ * 7 1 w I 01 7 * ' * 7 Qq) ‘0(^1 7 * * * 7 7 1 7 * * ' 7 


where p(7\ , . * - , T m | 0 X , • * • , 0 q ) Is the probability law of T\ , . •. , T m and 
the function <f> does not depend upon 0 i, • - • , B q . 

The sufficiency of the condition stated in the hypothesis of Theorem I is now 
immediately evident. For, if p' and 0 ' refer to the second set of algebraically 
independent statistics and Ti = 2 \(1\ , . • • , T m ) where the functions are inde¬ 
pendent, the relations can be solved for the T, in terms of the Ti giving 

Ti - 

= V\UT [, ■ • ■ , T' m \ . ■ • , T m (T [, • • • , Ti) | ft, • • ■ , 6 q ] “ v ’ T m) 


$ (&1 7 * 4 4 7 ^ftj 7 4 4 4 7 ^ 0(^1 7 4 4 4 7 ^»7 7 


d ( T [, •••7 Ky 
a(r t , 

,6l) d(T[, ... ,r' m y 


*See Neyman and Pearson: 1 'Sufficient Statistics and Uniformly Most Powerful Test* 
of Statistical Hypotheses, 99 Statistical Research Memoirs of the University of London , June 
1936. The notatian of the present paper is taken from this article. 

•See Neyman's article in the Giornale dell* Insituto Ilaliano degli Attuari , Vol. VI, 
No. 4 (1935) as well as the memoir referred to in footnote 1. 



TWO PROPERTIES OF SUFFICIENT 8f AtfttflCS 


105 


and 


( 2 ) 


P(*i, 


Xn\ 81 , • ■ • , d t , • • • , Ol) 

— p (T l , • ' ' I I'm I Bl , • • • , Bq) (xi , • • • , Xn ; Bq +1 )••• I Bl )• 


Proof of the necessity is somewhat more involved. Since the T,- and T[ are 
both sets of algebraically independent statistics with regard to Bi, ••• ,0,, 
equations (1) and (2) are satisfied. They are, in fact, identities when the 
values of T \, • • • , T m and T [, • • • , T m in terms of the x,- are substituted. 
Division of (1) by (2) and multiplication leads to the equation 


(3) 


p(Ti, • • •, Tm |fli> • • •, B i) _ 0'(xi, • • •, x,; flg+i, • ♦«, Bi) 
p'(T i, • • •, T m | Bi, • • •, 6 t ) • • •, x»; Bq+i, • • •, Bi) 


The right side of (3) is free of Bi , • • • , 6 t . Therefore, in reality the left side 
must be too. If some or all of the parameters 8 X , • • • , B q enter formally into 
the left side, we can choose m + 1 sets of values B \, • • • , B\ (i — 1, • • • , m + I) 
such that each of the m -j- 1 functions p( T \, • • • , T m | 8 \, • ■ • , B' g ) + p’{T [, 
• • • , T' m \ B\ , ■ ■ ■ , $1) differs formally from all of the others. We can, then, 
since each is equal to the right side of (3) which is free of Bi, • • • , 6 equate 
any one of these functions to the remaining m in turn. This provides m inde¬ 
pendent equations whose very existence proves that the T'i are functions of the 
Tj and vice versa. 

If none of the parameters Bt, ■ ■ ■ , 8„ enters formally into the left side of (3), 
p(7\ , • • • , T m | Bi , • • • , B q ) must be of the form p(l\ , • • • , T„)g(8i, ■ ■ ■ , B q ) 
and p'{T [, • • • , T'„ | Bi , • • • , 6 t ) of the form p'{T [, • • • , T' n )g(6i , • • ■ , Bq). 
In this case the original probability law p(xj, • • • , x„ | B t , • • • , 6 t , • • • , Bi) 
contains B t , ■ • • , 8 q only nominally and there can be no talk of any statistics 
designed to estimate these parameters either singly or in combination. 

When m = 1 and the set of algebraically independent statistics reduces to 
one, the single statistic is termed a shared sufficient statistic of the parameters 
Bi, ■ ■ ■ , Bq .* For this special case, Theorem I can be restated as follows. If 
T is a shared sufficient statistic with regard to the population parameters 
Bi , • • • , Bq and the probability distribution p(xi, • • • , x„ | Bi , • • • , Bq, • • • , Bi), 
the necessary and sufficient condition for the sufficiency of any statistic T' 
with regard to the same parameters and the same probability distribution is 
that T' be a function of T. When m and q both equal one, the statistic becomes 
a sufficient statistic in the sense originally defined by Fisher in 1922. 

A physical law is independent of the coordinate system used to express it. 
This fact is taken account of in modern physics through the employment of 
tensors. One might hope for a parallel situation in the relation between suffi¬ 
cient statistics and the probability law to which they refer. Given any l 
parameter family of distribution laws p(xi, • • • , x» | Bi, • • • , Bi), the substitu- 


•See the memoir mentioned in footnote 1. 



106 


W. D. EVANS 


tion 0i - 0,(0i', ..., d[) (t =* 1, • • • , Z) leads to the equally valid representation 
of the family 

p'Csi ,•••,*» | e[ , • • •, e'i) 

= p[*i ,•••,*.! 0i(0a> • • • , #!), • • • » 0i(0 [, • • • , 0i')]. 

Is a set of statistics sufficient with respect to the first representation also suffi¬ 
cient with respect to the second? The answer is partly in the affirmative and 
is given by the following proposition. 

Theorem II. If the set of algebraically independent statistics T x , • ■ • , T m is 
sufficient with regard to the parameters 8 \, • • • , 6 q and the probability law 
p(x i, • • • , x» | 0i, • • • , 0,, • • • , 8 t ), it is also sufficient with regard to 0i, • • ■ , 0« 
and any other representation p'{xi, • • • , x n | 8\, • • • , 8„ , • • • , 8i) of the same 
probability law provided 8< (i = 1, • • • , g) are independent functions of 8 t , , 8„ 

only and 0y (j = q ■+■ 1, • • • , l) are functions of 8„+ 1 , ■ ■ ■ , 8 t only. 

Proof: The proof of the theorem is obvious. We are given the fact that 
p(xi , • • • , x« | 0i, • • • , 0«, • • • , 8 i) = p(T i, • • • , 7* | 0i, • • • , 0«) -0( ij, • • • , 
Xn ; 8 V+ i , • • • , 6i). Since the 0< (i — 1, • • • , q) are functions of 0i , • • • , 0 a 
only and the 0y (j — q + 1, • • •, l) are functions of 0<,+i ,•••,01 only, it follows 
that 0< = 0,(0i, • • • , 8,) (i = 1, • • • , q) and 0, = 0,(0«+i, • • •, 0[) (j = 
q + 1, • • • , l). Consequently, 

/JV P'(*i» ••• ,Xn\8[, ••• ,0j, ••• ,0{) 

(4) 

= p'(jTl , • " • , Tm | 01 , " " " , 8q) .0 (Xi , • • • , Xn } 0«+l , • • • , 0f) 
and the theorem is established. 

New York, N. Y. 

NOTE ON THE MOMENTS OF A BINOMIALLY DISTRIBUTED VARIATE 

By W. D. Evans 

J. A. Joseph, has given two interesting triangular arrangements of numbers, 
the second of which is reproduced herewith as Table i. 1 The successive rows 
in this table are the coefficients in the expansion of x" as a function of the fac¬ 
torials x <0 , using the notation of the calculus of finite differences. For example, 

x 4 = x <4) + 6x (S) + 7x (S) + x, 

where 

x <0 = x(x — l)(x — 2) • • • (x — i + 1). 

Joseph points out that the coefficients may be used to generate the numbers 
of Laplace. 

1 J. A. Joseph, “On the Coefficients of the Expansion of X w ,” Annals of Math. Slat., 
Vol. X (1080), p. 208. 



MOMENTS Of A BINOMIAL VARIATE 


107 


A general expression defining any of the coefficients in terms of its place df 
occurrence in Table 1 may be set up. If we denote by F e (r) the number in 
row r and column c of the table, we have 

r-c 4 -l 61 k % ha -* 

(1) F,(r) 2 iilliDi'" 2* *«-i (r > c). 


This expression is of additional interest since the numbers defined by it are 
likewise the coefficients in the expression of the 2 -th moment about the origin 
of a binomially distributed variate in terms of the probability of the variate and 
the size of the sample in which it is contained. For example, it may be easily 

TABLE 1 

I 2 3 4 5 ••• c 


1 1 

2 1 1 

3 1 3 1 

4 16 7 1 

5 1 10 25 15 


1 


Fi(r) F,(r) F,(r) F t (r) 


Ft(r) ... F e (r) 


verified that if a is such a variate, p its probability of occurrence, and »the size 
of the sample in which it is contained, 

E(a ) ! = n <j y -f- np 

E(a )* = n <8) p ! + 3n (2) p* -f- np , 

2?(a) 4 = n (4) p 4 + 6n (1) p a + 7w <2) p 5 "t~ wp 

and so on. 

Ordinarily, computation of the higher moments of a binomially distributed 
variate is a tedious process of repeated differentiation. However, equation (1) 
immediately permits us to generalize the foregoing expressions to give the 2 -th 
moment of a as follows: 

(2) E{aY = £ n <,_,) p* _< £ ki £' h . 

i-0 

It will be noted that when c — 1 in equation (1) and i in equation (2) are equal 
to zero, the repeated summations vanish to be replaced by the value one. 

By means of equation (2) much of the labor usually involved in expressing 
the z-th moment about the origin of a binomially distributed variate in terms 
of n and p may be avoided. 

Washington, D. C. 



REPORT OF THE ANNUAL MEETING OF THE INSTITUTE 

The fifth annual meeting of the Institute of Mathematical Statistics was 
held in Philadelphia, Pennsylvania, on December 27 and 28, 1939, in conjunc¬ 
tion with the meetings of the American Statistical Association, the Econometric 
Society, and the American Sociological Society. The program for the meeting 
was arranged by Professor C. C. Craig. 

On Wednesday morning, December 27, the Institute held a session devoted to 
contributed papers on Statistical Theory and Methodology. Professor P. R. 
Rider, President of the Institute, presided. At that time the following papers 
were presented: 

1. On the unbiased character of certain likelihood-ratio tests when applied to normal 

systems. 

Joseph F. Daly, The Catholic University of America. 

2. The product seminvariants of the mean and a central moment in samples. 

C. C. Craig, University of Michigan. 

3. A method for minimizing the sum of absolute values of deviations . 

Robert Singleton, Princeton Local Government Survey. 

4. On certain criteria for testing the homogeneity of k estimates of variance. 

C. Eisenhart and Frieda S. Swed, University of Wisconsin. 

5. On a test whether two samples are from the same population. 

A. Wald and J. Wolfowitz, Columbia University and Brooklyn, New York. 

6. The power functions of certain tests of significance in harmonic analysis and lag cor - 

relation. 

William G. Madow, Washington, D. C. 

7. Some theoretical aspects of the use of transformations in the statistical analysis of rep¬ 
licated experiments. 

W. G. Cochran, Iowa State College. 

8. The standard errors of geometric and harmonic types of index numbers. 

Nilan Norris, Hunter College. 

9. A study of R. A. Fisher's z distribution and the related F distribution . 

L. A. Aroian, Hunter College. 

10. A note on the analysis of variance with unequal class frequencies. 

Abraham Wald, Columbia University. 

11. An approach to problems involving disproportionate frequencies. 

Burton D. Seeley, U. S. Department of Labor. 

Abstracts of these papers are given at the close of this report. 

Immediately following the session just described, the Institute held its annual 
business meeting. At that time President Rider announced that the newly 
elected officers for the year 1940 are: President, S. S. Wilks, Princeton Uni¬ 
versity; Vice-Presidents: C. C. Craig, University of Michigan, and A. T. Craig, 
University of Iowa; Secretary-Treasurer: P. R. Rider, Washington University. 

At one o'clock on the same day, members of the Institute and their guests 

108 



REPORT 07 THE ANNUAL MEETING 


109 


attended the annual luncheon. At the luncheon, Professor B. H. Camp ad¬ 
dressed the Institute on Non-atandard Deviations. 

On Wednesday afternoon, the Institute met jointly with the American Statis¬ 
tical Association for a program devoted to Lag Effects in Statistics and Eco¬ 
nomics. Professor J. D. Tamarkin presided and at this time the following 
papers were read: 

1. Lag effects in statistics and related problems. 

A. J. Lotka, Metropolitan Life Insurance Company. 

2. Some methods in the analysis of lag effects. 

H. T. Davis, Northwestern University. 

3. Lag effects in economics. 

Charles F. Roos, Institute of Applied Econometrics, Inc. 

A joint session with the Biometric Section of the American Statistical Associa¬ 
tion was held on Wednesday evening, Professor George W. Snedecor presiding. 
The papers presented at this session, which dealt with Design and Analysis of 
Replicated Experiments, were the following: 

1. Practical difficulties met in the use of experimental designs . 

A. E. Brandt, Soil Conservation Service. 

2. Factorial design and covariance in the biological assay of vitamin D. 

C. I. Bliss, Sandusky, Ohio. 

3. Combinatorial problems in the design of experiments. 

Gertude M. Cox, Iowa State College. 

4. Experimental trials with balanced incomplete blocks . 

W. J, Youden, Boyco Thompson Institute. 

On Thursday afternoon the Institute held consecutively joint sessions with 
the American Sociological Society and the Econometric Society. At the first of 
these, Professor William F. Ogbum presided and the following program was 
presented: 

1. How the mathematician can help the sociologist. 

Samuel A. Stouffer, University of Chicago. 

2. Some problems of combinations and permutations as they apply to a comprehensive 

classification of social groups. 

George A. Lundberg, Bennington College. 

Discussion: C. C. Craig, University of Michigan. 

Philip M. Houser, U. S. Bureau of the Census. 

At the second session the topic for discussion was Recent Advances in Business 
Cycle Analysis and these papers were given: 

1. Recursive methods in business cycle analysis. 

Merrill M. Flood, Princeton Surveys. 

2. An appreciation of some recent mathematical business cycle theories. 

Gerhard Tintner, Iowa State College. 

3. The statisticians ’ new clothiers. 

Arne Fisher, Western Union Telegraph Company. 


Patjja R. Rider, Secretary . 



ABSTRACTS OF PAPERS 

(Presented on December 27, 1939, at the Philadelphia meeting of the Institute) 

On the Unbiased Character of Certain Likelihood-Ratio Tests when Applied to 
Normal Systems. Joseph F. Daly, The Catholic University of America. 

Considor a random sample of N observations on a set of variates x 1 , • • • , where 
x l , • • • , x? are assumed to be normally distributed about means which are linear functions 
m i - 2b x ,x* of the fixed variates jr*+ l , • • • , x*. One is sometimes required to decide whether 
the sample tends to contradict the further hypothesis, H 0 , that the coefficients belonging 
to a certain subset of the fixed variates, say x , x^ k } have the specific values 6* 0 • 
Such a situation occurs, for example, in the generalized analysis of variance. In this paper 
it is shown that the Neyman-Pearson method of the ratio of likelihoods yields a test of Ho 
which is (at least locally) unbiased; in other words, thiB test is less likely to reject Ho when 
the sample is in fact drawn from a normal population in which « b j 0 than when it is drawn 
from a normal population in which the b J are different from but sufficiently close to b * 0 . 
In the special cases k *■ 1 or h — 1 the proof goes through even without the restriction that 
the true 6* be close to &*„, a result which is also implicit in the papers by P. C. Tang and 
P. L. Hsu ( Stat . Res. Mem. Vol. 2). 

Similarly with respect to the hypothesis Hi that the deviations x' — Xb,x 9 fall into 
certain mutually independent sets the X-test is at least locally unbiased; and it has the 
additional property that the expected value of any positive integral power of \/\ is greater 
when Hj is true than when the sample is drawn from any other normal population. 

The Product Seminvariants of the Mean and a Central Moment in Samples. 

C. C. Craig, The University of Michigan. 

The method used by the author in calculating the product seminvariants of a pair of 
central moments in samples is not adapted without modification to the present problem. 
In the present paper the necessary modification is developed which gives a routine method 
for the calculation of these sampling distribution characteristics. The calculation is a 
little heavier than in the previous case but the results for the mean and the second, third, 
and fourth central moments are given up to the fourth order except in one case in which the 
weight is 13. It is planned to follow this with a further study of the distribution of Fisher's 
t in samples from a normal population. 

A Method for Minimizing the Sum of Absolute Values of Deviations. Robert 
Singleton, Princeton Local Government Survey. 

E. C. Rhodes ( Philosophical Magazine , May 1930) presented a method for the estimation 
of parameters in a linear regression where it is desired to minimize the sum of absolute 
values of the deviations. In this paper the structure of the deviation surface is analyzed 
and a method of steepest descent is developed which for computational purposes is an 
improvement over Rhodes' method. The process is finite and leads to an exact solution. 
The method and the formulae used are such os to permit the successive additions of new 
observations or sets of observations to the original data, or the exclusion of an observation 
from the original set, and the determination of the parameters for the sets of data so de¬ 
rived, with little additional labor. 


110 



ABSTRACTS Of PAPERS 


111 

Os Certain Criteria, for Testing the Homogeneity of k Estimates of Variance. 

C. Eisenhart and Frieda S. Swed, University of Wisconsin. 

Given k variance estimates s|, »J, •••,»* with »,*}, (r — 1, 2, • • • , k), independently 
distributed as x**’ for n r degrees of freedom, tests of the hypothesis, Ho , that v? — v’, 
(r - 1,2, • • • , k), where <r’ is unknown, have been based to date on one or the other of the 
quantities 

* 

Qi -1}«,(»; - s»)»/2s* 


Qt - tv log (ns'/ic) ~^tv, log |n r Sr/u> r ) 

where the w r are weights, w ■» 23 tv,, n — ^ n r , and ns* — 2^ n,*J. A. E. Brandt and 

W. L. Stevens have advocated the use of Qi , referring an observed value of Qi to the x* 
distribution for k — 1 degrees of freedom. J. Neyman, E. S. Pearson, B. L. Welch, and 
M. S. Bartlett have advocated tests based on Q*, Bartlett definitely proposing the use of 
degrees of freedom as weights, i.e. w r - n r , and recent work of E, J. G. Pitman and others 
has shown that unless w r « n r tests based on Q% are biased. (A statistical test of an hypoth¬ 
esis H is said to be unbiased when the probability of rejecting H by its use is a minimum 
when H is true; obviously a desirable property.) When w r — n r Bartlett has suggested that 

the distribution of Q* can be satisfactorily approximated by referring Q*/| 1 + r;—— 

[ 3 (fc — 1) 

to the x* distribution for k — 1 degrees of freedom. In this paper we discuss 

the adequacy of the x* distribution to describe the distribution of Qi and of the adjusted 
Qt when the degrees of freedom, n r , are small. 

U. S. Nair and D. J. Bishop have given theoretical evidence which suggests that when 
n r ^ 2, (r - 1, 2, • •• , k), Bartlett’s adjusted Qt may be expected to conform to the x 1 
distribution reasonably well in the neighborhood of the 5% and 1% levels. Using 1000 
samples of 4 for which nr«5/(n r+ i) has been tabulated by W, A. Shewhart in Table D, Ap¬ 
pendix II of his “Economic Control of Quality of Manufactured Product,“ 200 values of 
Qi and Qt (with adjustment) were calculated and compared with the x f distribution for 
k — 1 degrees of freedom. Two cases were studied: Case I, k ■* 5 and n\ «• ni * • • • — 3; 
Case II, k » 3 and n\ « n 2 - 3 while n s - 9. As measured by the Chi-Square Goodness of 
Fit Test, using 11 degrees of freedom, the fits were good in all four instances. In Case I, 
for Bartlett’s adjusted Qt the test led to .80 < P < .90, and to .70 < P < .80 for the Brandt- 
Stevens Qi ; in Case II, the fits were poorer with .50 < P < .70 for Bartlett’s criterion and 
.10 < P < .20 for the Brandt-Stevens. However, an examination of the descending cumula¬ 
tive distributions showed that in all instances these criteria exhibited a deficiency of large 
values of x s , with the deficiency, in general, more marked in the case of the Brandt-Stevens 
test. Consequently, when one uses significance levels for these criteria obtained by means 
of the x 1 approximation advocated, one is in reality using a level of significance slightly 
less than that professed. The discrepancy is not great, however, and is on the safe side, i.e. 
one will reject H o falsely in the long run less often than one professes to be doing. Without 
doubt, however, one will also detect the falsehood of Ho when <rj ** a ), for at least one pair 
of values of r and t, r less often in the long run by the use of these approximate signifi¬ 
cance levels than if the true levels were used, but we have no definite evidence at present 
on this point. A somewhat disquieting feature is that the agreement between the x f values 
yielded by the two criteria becomes worse as one proceeds toward larger values of x 1 in 




112 


ABSTRACTS OF TABERS 


terms of either quantity. Thus, of 8 samples which Q% would have rejected at the 8% level 
in Case I, only 4 of these would have been rejected by Qi , and Qz would have passed 3 
samples of the 7 rejected by Qi . Thus it appears that, if one wishes to work with a given 
chance of rejecting Ho falsely, one should choose one of these criteria and then stick to it in 
future applications. For large values of the n r the two criteria tend to equivalence, so the 
choice between them is of interest mainly for small n r , but cannot be made with full in¬ 
formation until more is known about the bias, if any, of the Brandt-Stevens test, and the 
relative power of the two tests with regard to alternatives to H 0 . 


On a Test Whether Two Samples are from the Same Population. A. Wald 

and J. Wolfowitz, Columbia University and Brooklyn, New York. 


Let X and Y be two independent random variables about whose distributions nothing is 
known except that they are continuous. Let Xi , xt , • • • , x m be a set of m independent 
observations on X and let yi , y 2 , • ■ • , y n be a set of n independent observations on Y. 
The null hypothesis to be tested is that the distributions of X and Y are identical. 

Let the set of m + n observations be arranged in order of magnitude, thus: i\ , z %, • • • , 
z m +n . Replace Zi by (i - 1, 2, • • • , m + n) where Vi - 0 if z, iB a member of the set of 
x’b and =* 1, if z» is a member of the set of y* s. Since the null hypothesis states only that 
the distributions of X and Y are identical without specifying them in any other way, the 
distribution of the statistic V used for testing the null hypothesis must be independent of 
this common distribution of X and Y. It can easily be shown that the statistic U must be 
a function only of the sequence Vi , v 2 , • • • , v mAr1i . 

A subsequence v a , t>, + i , • • • , v t + r (where r may also be 0) is called a run if v, « « 

••• — v UJ ,.r and if t/,_i ^ v» when 8 < 1 and if ^ i> a+ r*i when s + r < m + n. The 
statistic U defined as the number of runs in the sequence v\ , 9 • * • , v m+ „ seems a suitable 

statistic for testing the null hypothesis. A difference in the distribution functions of X 
and Y tends to decrease U. Hence the critical region is defined by the inequality JJ< u 0 , 
where u 0 depends only on m, n, and the level of significance adopted. If m < n and 
P\U * c) is the probability that U =» c, then: 


P\U - 2 K) 


2 • H ~ l Ck-i) 

m+n C n 


(X - 1,2, — , m), 


P\U » 2X — 1) 


("»-*C fc-i n - 1 Cju2 + n ^ l C±-i n ~ l Ck-i) 

m+n/^ 
v 'm 


The mean of U is: 


The variance of U is: 


2mn 

m + n 


+ 1 . 


(K « 2, 3, • • •, m + 1). 


2mn(2mn — m — n) 

(m + n) 2 (m + n — 1) ‘ 


If — » a (a positive constant) and m —► *>, the distribution of U converges to the normal 
n 

distribution. 


The Distribution of Quadratic Forms In Non-Central Normal Random Vari¬ 
ables. William G. Madow, Washington, D. C. (Presented to the Institute 
under a slightly different title) 




ABSTRACT* Of TAPERS 


113 


Let the distribution of a sum of non-central squares of normally and independently dis¬ 
tributed random variables which have the unit variances be called the x'* distribution. 
It is proved that if a set of quadratic forms have a sum which is the sum of the squares of 
their variables, then a necessary and sufficient condition that the quadratic forms be inde¬ 
pendently distributed in x' 3 distributions is that the rank of the sum of quadratic forms be 
equal to the sum of the ranks of the quadratic forms. Furthermore, the constants on which 
the x' 3 distributions depend may be obtained by substituting the values about which the 
variables are taken for the variables themselves in the quadratic forms. Roughly speaking 
the theorem states that if a set of quadratic forms satisfy the conditions of the Fisher- 
Cochran theorem when the true means vanish, then the set of quadratic forms will be 
independently distributed in x' 3 distributions when the true means do not vanish. 

Some Theoretical Aspects of the Use of Transformations in the Statistical 
Analysis of Replicated Experiments. W. G. Cochran, Iowa State College. 

The device of transforming the data to a different scale before performing an analysis of 
variance has recently been recommended by a number of writers for replicated experiments 
in which the original data show a markedly skew distribution. The use of transformations 
to obtain an approximate analysis has been supported mainly on the grounds that in the 
transformed scale the true experimental error variance is approximately the same on all 
plots. This paper considers the relation of the method of transformations to a more exact 
analysis. Discussion is confined to the y/x and sin” 1 y/~x transformations, which appear 
to receive the most frequent use in practice. 

To obtain an exact analysis, it is necessary to specify ( i ) how the expected value on any 
plot is obtained from unknown parameters representing the treatment and block (or row 
and column) effects (ii) how the observed values on the plots vary about the expected 
values. If the latter variation follows the Poisson law, (a case to which the square root 
transformation has been considered appropriate), the equations of estimation by maximum 
likelihood take the form 



where x is the observed and m the expected value on any plot, c is a typical unknown para¬ 
meter, and the summation extends over all plots whose expectations involve c. As the 
number of parameters is usually large (e.g. 16 in a 6 x 6 Latin square), these equations are 
laborious to solve; moreover, the question of obtaining small-sample tests of significance is 
difficult. It is shown that if a particular form can be assumed for the prediction formula 
in (i), namely that y/m is a linear function of the treatment and block (or row and column) 
constants, the equations of estimation may be reduced to the simpler form 

(2) £ 4 (>"' ~ V"*) - 0. 

e 

where r' - ~ ^ ) is a function closely related to the square root of x. Itfollows 

that the statistical analysis in square roots, with some slight adjustments, coincides with 
the maximum likelihood solution, provided that the above form can be assumed for the 
prediction formula. The appropriateness of this form in practice is briefly considered and a 
“goodness of fit” teBt by x* is developed. A numerical example is worked as an illustration 
and indicates that a good approximation is obtained by the transformation alone even 
with very small numbers per plot. The corresponding theory is also discussed for the inverse 
sine transformation, which applies where the original data are percentages or fractions 
whose experimental errors are derived from the binomial distribution. 



114 


ABSTRACTS OF PARSES 


In practice the type of analysis outlined above is unlikely to supplant the simple use of 
transformations, because it can seldom be assumed that the experimental variance is 
entirely of the Poisson or binomial type. The more exact analysis may, however, be 
useful (i) for cases in which the plot yields are very small integers or the ratios of very 
small integers (u) in showing how to give proper weight to an occasional zero plot yield. 

The Standard Errors of Geometric and Harmonic Types of Index Numbers. 

By Nilan Norris, Hunter College. 

Various statisticians have made empirical studies of the sampling errors of certain types 
of index numbers used in the United States and England. None of these writers has taken 
advantage of the tools afforded by the modern theory of estimation, including fiducial 
inference, as a means of arriving at direct and general expressions for estimating the stand¬ 
ard deviations of the sampling errors of geometric and harmonic types of index numbers. 

A known expression for the first approximation to the variance of a function, as given by 
the relation between the variance of the function and the variance of the argument, is 
valid for that general class of distributions of which the variance and a higher moment 
are finite. With the aid of this relation, there appear simple and useful forms for estimat¬ 
ing the standard errors of geometric and harmonic types of indexes. For sufficiently large 
samples, these forms are valid for all of the types of distributions of price relatives, produc¬ 
tion relatives, and similar observations ordinarily encountered, provided that there are 
satisfied the necessary conditions for drawing sound inferences on the basis of sampling 
without reference to the value of the variate. 

Necessary conditions for using tests of significance soundly in connection with index 
number problems are those of realistic and intimate acquaintance with observations, and 
careful attention to certain broad theoretical considerations which determine whether or 
not the index is suited for the purpose for which it is used. 

A Study of R. A. Fisher’s z Distribution and the Related F Distribution. L. A. 
Aroian, Hunter College. 

The following results for the z distribution and related F distribution are investigated: 

( 1 ) Geometric properties. 

( 2 ) Exact values of the seminvariants and moments of z. Exact values of the first 
four central moments of F, 

(3) The approach to normality of both distributions as ni and n 3 become large in any 
manner whatever. 

(4) The Pearson types of approximating curves, the logarithmic normal approximation, 
the Gram-Charlier approximation, and the uses of these in finding any level of 
significance of z and of F. 

A Note on the Analysis of Variance with Unequal Class Frequencies. Abraham 
Wald, Columbia University. 

Let us consider p groups of variates and denote by m,- (j « 1 , • • • , p) the number of 
elements in the j-th group. Let Xu be the i-th element in the j-th group. fWe assume that 
xh is the sum of two variates and 17 /, i.e. x^ - m + where «<,• (i - 1 , • • •, m/; j ■■ 
1 , • • • , p) is normally distributed with mean p and variance <r 8 , and 17 / O’ • 1 , • • • * p) is 
normally distributed with mean p' and variance <r'*. All the variates and 17 / are supposed 
to be distributed independently. The intra-class correlation p is given by 


P 



ABSTRACTS OF PAPERS 


115 


Confidence limits for p have been derived only in case of equal class frequencies; i.e. ,mt ** 
«« - ■ • • - m, . We give here the confidence limits for p in case of unequal class frequen- 

0-'* ( 

cies, Since p is a monotonic function of —, it is sufficient to derive confidence limits for 

«r* 


- 7 . Denote -7 by X* and the arithmetic mean of the ;-th group by £ j . Let 

IT* a* 


Wf 


mi 

1 + m/X*’ 


and denote by F 1 and the lower and upper confidence limits respectively of F, where F 
has the analysis of variance distribution with p - 1 and AT — p «■ mi + • • • + m p - p 
degrees of freedom. Then the lower confidence limit X| of X 1 is given by the root of the equa- 
tion in X*: 


( 1 ) 


m 



p - 1 - £i)' 


and the upper confidence limit Xj of X 1 is given by the root of 


F „ 


( 2 ) m « Fi . 

For calculating the roots of (1) and (2), we can make use of the fact that jf(X 3 ) is raono- 
tonically decreasing with increasing X 3 . 


An Approach to Problems Involving Disproportionate Frequencies. Burton 
D. Seeley, Washington, D. C. 

Applied mechanics offers an analysis of variance solution to problems of multiple classi¬ 
fication involving disproportionate sub-class numbers. The quality of orthogonality may 
be attained in such problems by measuring the variability between classes of any one 
classification after centering the others. This approach, which is not limited by the num¬ 
ber of classes or the number of classifications, treats the problem involving equal sub-class 
numbers as a special phase of the general analysis of variance. 



CONSTITUTION 

OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Purpose 

1 . This organization shall be known as the Institute of Mathematical Statistics. 

2. Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 

Membership 

1. The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others who 
have been members for twenty-three months prior to the date of voting. 

ARTICLE III 

Officers, Board of Directors, Committee on Membership, and Committee on 

Publications 

1 . The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre¬ 
tary-Treasurer, elected for a term of one year by a majority ballot at'the annual meeting 
of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31,1936. 

2 . The Board of Directors of the Institute shall consist of the Officers and the previous 
President. 

3. The Institute shall have a Committee on Membership composed of three Fellows. 
At their first meeting subsequent to the adoption of this Constitution, the Board of 
Directors shall elect three members as Fellows to serve as the Committee on Membership, 
one member of the Committee for a term of one year, another for a term of two years, 
and another for a term of three years. Thereafter the Board of Directors shall elect 
from among the Fellows one member annually at their first meeting after their election 
for a term of three years. The president shall designate one of the Vice-Presidents as 
Chairman of this Committee. 

4. The Institute shall have a Committee on Publications composed of three Members 
or Fellows elected by the Board of Directors. The President shall designate a Vice- 
President as Ex Officio Chairman of this Committee. 

ARTICLE IV 
Meetings , 

1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 
time as the Board of Directors may designate. Additional meetings may be called from 

116 



BY-LAWS 


117 


time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting. All meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President may 
be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board 
may be held from time to time at the call of the President or any two members of the 
Board. Notice of each meeting of the Board, other than the two regular meetings, 
together with a statement of the business to be brought before the meeting, must be 
given to the members of the Board by the Secretary-Treasurer at least five days prior to 
the date set therefor. Should other business be passed upon, any member of the Board 
shall have the right to reopen the question at the next meeting. 

3. The Committee on Membership shall hold a meeting immediately after the annual 
meeting of the Institute. Further meetings of the Committee may be held from time to 
time at the call of the Chairman or any member of the Committee provided notice of such 
call and the purpose of the meeting is given to the members of the Committee by the 
Secretary-Treasurer at least five days before the date set therefor. Should other business 
be passed upon, any member of the Committee shall have the right to reopen the ques¬ 
tion at the next meeting. 

4. At a regularly convened meeting of the Board of Directors, three members shall 
constitute a quorum. At a regularly convened meeting of the Committee on Member¬ 
ship, two members shall constitute a quorum. 

ARTICLE V 

Publications 

1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute- 
Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion or Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 

Amendments 

1. This constitution may be amended by an affirmative two-thirds vote at any regu¬ 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 

BY-LAWS 

ARTICLE I 

Duties op the Officers, Board of Directors, Committee on Membership, and 

Committee on Publications 

I. The President, or in his absence, one of the Vice-Presidents, or in the absence of the 
President and both Vice-Presidents, a Fellow selected by vote of the Fellows present, 



118 INSTITUTE OF MATHEMATICAL STATISTICS 

shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, but at meetings 
of the Board of Directors he may vote in all cases. At least three months before the date 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nomina¬ 
tions may be submitted in writing, if signed by at least ten Fellows of the Institute, up to 
the time of the meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings 
at the meetings of the Institute and of the Board of Directors, send out calls for said 
meetings and, with the approval of the President and the Board, carry on the corre¬ 
spondence of the Institute. Subject to the direction of the Board, he shall have charge 
of the archives and other tangible and intangible property of the Institute. He shall 
send out calls for annual dues and acknowledge receipt of same; pay all bills approved 
by the President for expenditures authorized by the Board or the Institute; keep a 
detailed account of all receipts and expenditures, prepare a financial statement at the 
end of each year and present an abstract of the same at the annual meeting of the Insti¬ 
tute after it has been audited by a Member or Fellow of the Institute appointed by the 
President as Auditor. The Auditor shall report to the President. 

3. The Board of Directors shall have charge of the funds and of the affairs of the 
Institute, with the exception of those affairs specifically assigned to the President or to 
the Committee on Membership. The Board shall have authority to fill all vacancies 
ad interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time 
to carry on the affairs of the Institute. 

4. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 
different grades of membership. 

5. The Committee on Publications, under the general supervision of the Board of 
Directors, shall have charge of all matters connected with the publications of the Insti¬ 
tute, and of all books, pamphlets, manuscripts and other literary or scientific material 
collected by the Institute. Once a year this Committee shall cause to be printed in the 
Official Journal the Constitution and By-Laws and a classified list of all the Members 
and Fellows of the Institute. 


ARTICLE II 
Dues 

1. Members shall pay five dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members shall pay 
five dollars annual dues. The annual dues of Fellows shall be five dollars. The annual 
dues of Sustaining Members shall be fifty dollars. Honorary Members shall be exempt 
from all dues. 

2. Annual dues shall be payable on the first day of January of each year. 

3. The annual dues of a Fellow or Member include a subscription to the Official 
Journal. The annual dues of a Sustaining Member include two subscriptions to the 
Official Journal. 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 



BY-LAWS 


119 


may be six months in arrears, and to accompany such notice by a copy of this Article. 
If such person fail to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer shall report the delinquent one to the Board of Directors, 
by whom the person’s name may be strioken from the rolls and all privileges of member¬ 
ship withdrawn. Such person may, however, be re-instated by the Board of Directors 
upon payment of the arrears of dues. 

ARTICLE III 
Salaries 

1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 


ARTICLE IV 
Amendments 

1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors. 



DIRECTORY OF THE INSTITUTE OF MATHEMATICAL STATISTICS 


(As of January 1, 1940) 

Please notify the Secretary, P. R. Rider, Washington University, 
St. Louis, Mo., of any changes of addresses or errors in names, titles or 
addresses as printed. 


Anderson, Mr. Paul H. 108 E. Chalmers St., Champaign, Ill. 

Acerboni, Dr. Argentino V. Larroque 232 Banfield, Buenos Aires, Argentina. 

Alter, Prof. Dlnsmore. Director of Griffith Observatory, Los Angeles, Calif. 

Arnold, Prof. Herbert £. Dept, of Math., Wesleyan Univ., Middletown, Conn. 

Aroian, Prof. Leo. Dept, of Math., Hunter College, Lexington Ave. and E. 68th St., 
New York, N. Y. 

Bacon, Prof. H. M. Box 1144, Stanford Univ., Calif. 

Baker, Dr. G. A. Experiment Station, College of Agriculture, Univ. of Calif., Davis, 
Calif.. 

Barral-Souto, Dr. Jos5. Cordoba 1469, Buenos Aires, Argentina. 

Barrett, Mr. C. S. 3145 Maple Ave., Brookfield, Ill. 

Beall, Mr. Geoffrey. 729 Queen St., Chatham, Ontario, Canada. 

Been, Mr. Richard O. 32 Old Fisheries Bldg., U. S. Bureau of Agricultural Economics, 
Washington, D. C. 

Bennett, Prof. A. A. Dept, of Math., Brown Univ., Providence, R. I. 

Berkson, Dr. Joseph. Mayo Clinic, Rochester, Minn. 

Bernstein, Prof. Felix. Dept, of Biometry, New York Univ., New York, N. Y.. 

Boley, Mr. Charles C. 305 Ceramics Bldg., Urbana, Ill. 

Bridger, Mr. Clyde A. 1801 No. Eleventh St., Boise, Idaho. 

Brooks, Mr. A. G. 5819 W. Erie St., Chicago, Ill. 

Burgess, Dr. R. W. Western Electric Co., 195 Broadway, New York, N. Y. 

Burr, Prof. Irving W. Antioch College, Yellow Springs, Ohio. 

Bushey, Prof. J. Hobart. Dept, of Math., Hunter College, Park Ave. and 68th St., New 
York, N. Y. 

Camp, Prof. B. H. Dept, of Math., Wesleyan Univ., Middletown, Conn.. 

Camp, Prof. C. C. Dept, of Math., Univ. of Nebraska, Lincoln, Neb. 

Carlson, Mr. John L. 736 West St., Reno, Nev. 

Carver, Prof. H. C. Dept, of Math., Univ. of Michigan, Ann Arbor, Mich.. 

Chapman, Mr. Roy A. Forest Service, U. S. Dept, of Agriculture, 1000 Masonic Temple, 
New Orleans, La. 

Clark, Prof. Andrew G. 1012 Laporte Ave., Ft. Collins, Colo. 

Cochran, Prof. W. G. Statistical Laboratory, Iowa State College, Ames, Iowa. 

Coleman, Mr. E. P. Dept, of Math., Municipal Univ. of Omaha, Omaha, Neb. 

Cox, Dr. Gerald J. Mellon Institute, Pittsburgh, Pa. 

Craig, Prof. A. T. Dept, of Math., Univ. of Iowa, Iowa City, Iowa.. 

Craig, Prof. C. C. Dept, of Math., Univ. of Michigan, Ann Arbor, Mich.. 

Crathorne, Prof. A. R. Dept, of Math., Univ. of Illinois, Urbana, Ill.. 

Crowe, Prof. S. E. 137 University Drive, East Lansing, Mich. 

Curtiss, Dr. J. H. Belle Ayre Apts., Ithaca, N. Y. 

Dahmus, Mr. Maurice E. 203 So. Fourth, Champaign, Ill. 

120 



DIRECTORY 07 INSTITUTE 


Daly, Dr, Joseph F. 719 Irving St., N.E., Washington, D. C. 

Dantzig, Mr. George B. 2609 Fulton St., Berkeley, Calif. 

DeLury, Dr. D. B. Dept, of Math., Univ. of Toronto, Toronto 5, Canada. 

Doming, Dr. W. B. Bureau of the Census, U. S. Dept, of Commerce, Washington, D. C.. 
Dodd, Prof. B. L. Dept, of Math., Univ. of Texas, Austin, Texas.. 

Dodge, Mr. Harold F. Bell Telephone Lab., 463 West St., New York, N. Y. 

Doob, Dr. J. L. Dept, of Math., Univ. of Illinois, Urbana, Ill.. 

Dressel, Dr, Paul L. 6126 S. Woodlawn Ave. Chicago, Ill. 

Dunlap, Prof. Jack W. Catharine Strong Hall, Univ. of Rochester, Rochester, N. Y. 
Dwyer, Prof. Paul S. 407 Camden Ct., Ann Arbor, Mich. 

Edgett, Dr. G, L. Queen’s Univ., Kingston, Ontario, Canada. 

Blsenhart, Dr. Churchill. Dept, of Math., Univ. of Wisconsin, Madison, Wis. 

Elston, Mr. James S. Travelers Insurance Co., Hartford, Conn, 
siting, Mr. John P. Director of Research, Kendall Mills, Paw Creek, N. C. 

Blveback, Miss Mary L. Princeton Surveys, Princeton, N, J. 

Ettinger, Mr. W. J. Edison General Electric Appliance Co., 5600 W. Taylor St., Chi¬ 
cago, Ill. 

Bvans, Prof. H. P. North Hall, Univ. of Wisconsin, Madison, Wis. 

Feldman, Dr. H. M. Sodan High School, St. Louis, Mo. 

Fertlg, Dr. John W. Dept, of Biostatistics, School of Hygiene and Public Health, Johns 
Hopkins Univ., Baltimore, Md. 

Fischer, Dr. C. H. Dept, of Math., Wayne Univ., Detroit, Mich. 

Fisher, Prof. Irving. 460 Prospect St., New Haven, Conn.. 

Flood, Dr. M. M. Princeton Surveys, Princeton, N. J. 

Foster, Mr. Ronald M. 122 E. Dudley Ave., Westfield, N. J. 

Frankel, Mr. Lester R. 1445 Otis Place, N.W., Washington, D. C. 

Freeman, Prof. H. A. Dept, of Economics, Massachusetts Inst, of Technology, Cambridge, 
Mass. 

Fry, Dr. T. C. Bell Telephone Lab., 463 West St., New York, N. Y.. 

Gavett, Prof. G. I. Dept, of Math., Univ. of Washington, Seattle, Wash. 

Geiringer, Dr. Hilda. Bryn Mawr College, Bryn Mawr, Pa. 

Gibson, Mr. Robert W. 503 W. California, Urbana, Ill. 

Gill, Mr. John P. Dept, of Math., Univ. of Alabama, University, Ala. 

Girshick, Mr. M. A. Bureau of Home Economics, Dept, of Agriculture, Washington, D. C. 
Glover, Prof. J. W. 620 Oxford Rd., Ann Arbor, Mich.. 

Greenwood, Prof. J. A. College Station, Box 245, Durham, N. C. 

Greville, Dr. Thos. N. E. Dept, of Math., Univ. of Michigan, Ann Arbor, Mich. 

Grove, Prof. C. C. 143 Milburn Ave., Baldwin, L. I., N. Y. 

Hart, Prof. W. L. Dept, of Math., Univ. of Minnesota, Minneapolis, Minn. 

Hebley, Mr. Henry F. Product Control Manager, Pittsburgh Coal Co., P. O. Box 145, 
Pittsburgh, Pa. 

Heidingsfield, Mr. Myron. Hotel Marcy, 720 West End Ave., New York, N. Y. 
Henderson, Mr. Robert. Crown Point, Essex Co., N- Y.. 

Hendricks, Mr. Walter A. Division of Agricultural Statistics, Agricultural Marketing 
Service, U. 8. Dept, of Agriculture, Washington, D. C. 

Henry, Mr. M. H. Route No. 4, Box 394 M, Lansing, Mich. 

Hoel, Dr. Paul G. Dept, of Math., Univ. of California at Los Angeles, Los Angeles, Calif. 
Horst, Dr. Paul. Supervisor of Selection Research, Procter and Gamble, Cincinnati, Ohio. 
Hotelling, Prof. Harold. Fayerweather Hall, Columbia Univ., New York, N. Y.. 
Huntington, Prof. B. V. 48 Highland St., Cambridge, Mass.. 

Hurwitz, Mr. William. 119 Concord Ave., Washington, D. C. 

Ingraham, Prof. M. H. North Hall, Univ. of Wisconsin, Madison, Wis.. 

Jackson, Prof. Dunham. 119 Folwell Hall, Univ, of Minnesota, Minneapolis, Minn.. 



122 


DIRECTORY OF INSTITUTE 


Jacobs, Mr. Walter. 5119 Third St., N.W., Washington, D. C. 

Janko, Dr. Jarslay. Praha XII, Krkonosska 3, Czechoslovakia. 

Juran, Mr. J. M. Western Electric Co., 195 Broadway, New York, N. Y. 

Keeping, Prof. £. S. Univ. of Alberta, Edmonton, Alberta, Canada. 

Kelley, Prof. Truman L. Lawrence Hall, Cambridge, Mass.. 

Kenney, Mr. John F. Dept, of Math., Northwestern Univ., Evanston, Ill. 

Klernan, Prof. Charles J. 14 Fairbanks St., Hillside, N. J. 

Knowler, Dr. Lloyd A. Dept, of Math., Univ. of Iowa, Iowa City, Iowa. 

Knudsen, Miss Lila F. Junior Mathematician, Food Division, U. S. Dept, of Agriculture, 
Washington, D. C. 

Kohl, Miss Alma. 2309 Russell St., Berkeley, Calif. 

Kossack, Mr. Carl F. Dept, of Math., Univ. of Oregon, Eugene, Oregon. 

Kullback, Dr. Solomon. Office of Chief Signal Officer, U. S. Navy Department, Wash¬ 
ington, D. C.. 

Kurtz, Dr. Albert K. 173 Cornwall St., Hartford, Conn. 

Laderman, Mr. Jack. 3224 Bronx Blvd., Bronx, N. Y. 

Larsen, Prof. Harold D. Dept, of Math., Univ. of New Mexico, Albuquerque, N. M. 
Leavens, Mr. Dickson H. Cowles Commission, Univ. of Chicago, Chicago, Ill. 

Lemme, Prof. Maurice M. Dept, of Math., Univ. of Toledo, Toledo, Ohio. 

Lindsey, Mr. Fred D. 2051 No. Brandywine St., Arlington, Va. 

Livers, Mr. Joe J. Dept, of Math., Montana State College, Bozeman, Mont. 

Lorge, Dr. Irving. 525 W. 120 St., New York, N. Y. 

Lotka, Dr. A. J, One Madison Ave., New York, N. Y.. 

Lundberg, Prof. George A. Dept, of Sociology, Bennington College, Bennington, Vt. 
Madow, Mr. William G. Hotel Lafayette, Washington, D. C.. 

Malzberg, Dr. Benjamin. New York State Dept, of Mental Hygiene, Albany, N. Y. 
Mansfield, Prof. Ralph. Dept, of Math., Chicago Teachers College, 6800 Stewart Ave.. 
Chicago, Ill. 

Mauchly, Prof, J. W. Dept, of Physics, Ursinus College, Collegeville, Pa. 

Mayer, Mr. George F. T. 1519 Seventh St., S.E., Minneapolis, Minn. 

Mises, Prof. R. von. Harvard Univ., Cambridge, Mass.. 

Mode, Prof. B. B. 088 Boylston St., Boston, Mass. 

Molina, Mr. E. C. 463 West St., New York, N. Y.. 

Mudgett, Prof. Bruce D. Dept, of Economics, Univ. of Minnesota, Minneapolis, Minn. 
MacLean, Mr. M. C. Chief of Census Analysis, Dominion Bureau of Statistics, Ottawa, 
Ontario, Canada. 

McEwen, Prof. G. F. Scripps Institution, La Jolla, Calif. 

Macphail, Dr. M. S. Dept, of Math., Acadia Univ., Wolfville, Nova Scotia, Canada. 
McCarthy, Dr. Michael C. Dept, of Math., University College, Cork, Ireland. 

Nesbitt, Dr. C. J. Dept, of Math., Univ. of Michigan, Ann Arbor, Mich. 

Neyman, Prof. J. Dept, of Math., Univ. of California, Berkeley, Calif.. 

Norris, Dr. Nilan. Dept, of Economics, Hunter College, New York, N. Y. 

Norton, Mr. K. A. 2123 Tunlaw Rd., N.W., Washington, D. C. 

Olds, Prof. E. G. 953 La Clair Ave., Regent Square, Pittsburgh, Pa.. 

Ollivier, Dr. Arthur. Box 405, State College, Miss. 

Ohnstead, Dr. Paul S. 463 West St., New York, N. Y. 

O’Toole, Dr. A. L. 6363 Sheridan Rd., Chicago, III.. 

Ozanne, Mr. Paul. Dept, of Agronomy, College of Agriculture, Univ. of Wisconsin, 
Madison, Wis. 

Ore, Prof. Oystein. Dept, of Math., Yale Univ., New Haven, Conn. 

Palley, Mr. A. 286 Commonwealth, Buffalo, N. Y. 

Parent©, Mr. A. R. 126 Church St., Hamden, Conn. 

Perryman, Mr. James H. 12 Richmond Ave., Ilfracombe, Devon, England. 



DIRECTORY OP INSTITUTE 


123 

■& 

Petrie, Prof. George W. South Dakota School of Mines, Rapid City, S. Dak* 

Pierce, Prof. J. A. Dept, of Math., Atlanta Univ., Atlanta, Ga. 

Pixley, Prof. H. H. Dept, of Math., Wayne IJniv., Detroit, Mich. 

Pollard, Prof. H. S. Dept, of Math., Miami Univ., Oxford, Ohio. 

Reed, Prof. L. J. 815 No. Wolfe St., Baltimore, Md. 

Rider, Prof. P. R. Dept, of Math., Washington Univ., St. Louis, Mo.. 

Richardson, Prof. C. H. Bucknell Univ., Lewisburg, Pa. 

Regan, Prof. Francis. Dept, of Math., St. Louis Univ., Grand and Pine Blvd., St. Louis, 
Mo. 

Rietz, Prof. H. L. Dept, of Math., Univ. of Iowa, Iowa City, Iowa.. 

Romig, Mr. H. G. Bell Telephone Lab., 463 West St., New York, N. Y. 

Roos, Dr. Charles F. Inst, of Applied Econometrics, 420 Lexington Ave., New York, N. Y.. 
Rulon, Prof. P. J. 13 Kirkland St., Cambridge, Mass, 

Scarborough, Prof. J. B. P. O. Box 86, Annapolis, Md. 

Schwartz, Mr. Herman M. College of Arts and Sciences, Duquesne Univ., Pittsburgh, Pa. 
Shaw, Mr. Lawrence W. Fayerweather Hall, Columbia Univ., New York, N. Y. 
Shewhart, Dr. W. A. 158 Lake Drive, Mountain Lakes, N. J.. 

Shohat, Prof. J. Dept, of Math., Univ. of Pennsylvania, Philadelphia, Pa.. 

Simon, Capt. Leslie B. Aberdeen Proving Ground, Aberdeen, Md. 

Singleton, Mr. Robert R. 20 Nassau St., Princeton, N. J. 

Spiegelman, Mr. Mortimer. 325 W. 86th St., New York, N. Y. 

Stephan, Mr. F. F. 1826 K St., N. W., Washington, D. C. 

Stouffer, Prof. S. A. Dept, of Sociology, Univ. of Chicago, Chicago, Ill. 

Swanson, Mr. A. G. General Motors Inst., Flint, Mich. 

Tai, Mr. Shih-Kuang. Tsing Hua Univ., Kun Ming, Yun Nan, China. 

Thompson, Dr. William R. 27 Oakwood St., Albany, N. Y. 

Thurstone, Prof. L. L. Dept, of Psychology, Univ. of Chicago, Chicago, Ill. 

Toops, Prof. Herbert A. 458 W. 8th Ave., Columbus, Ohio. 

Torrey, Miss Mary N. 463 West St., New York, N. Y. 

Treloar, Prof. Alan £• Dept, of Biometry, Univ. of Minnesota, Minneapolis, Minn. 
Trimble, Mrs. Anne. Apt. 612, 6104 Woodlawn Ave., Chicago, Ill. 

Trubridge, Dr. G. F. P. 98 Kingsway, Petts Wood, Kent, England. 

Vatnsdal, Mr. J. R. 304 Columbia St., Pullman, Wash. 

Vickery, Dr. C. W. 304 E. 5th St., Austin, Texas. 

Waite, Dr. Warren C. Division of Agricultural Economics, Univ. Farm, St. Paul, Minn. 
Walker, Prof. Helen M. Teachers College, Columbia Univ., New York, N. Y. 

Wald, Dr. A. Fayerweather Hall, Columbia Univ., New York, N. Y.. 

Wareham, Mr. Ralph B. 118 Elmer Ave., Schenectady, N. Y. 

Wei, Dzung~shu. International House, 19 Evans St., Iowa City, Iowa. 

Weida, Prof. F. M. Dept, of Math., George Washington Univ., Washington, D. C.. 
Welker, Mr. B. L. 160 Mathematics Bldg., Univ. of Illinois, Urbana, Ill. 

Weacott, Mr. Mason B. 1936 Greenwood Ave., Wilmette, Ill. 

White, Prof. A. B. Dept, of Math., Kansas State College, Manhattan, Kan. 

Wilder, Dr. Marian. 115 Bedford S. E., Minneapolis, Minn. 

Wilks, Prof. S. S. Fine Hall, Princeton Univ., Princeton, N. J.. 

Wilson, Dr. Elizabeth. One Waterman St., Cambridge, Mass. 

Wilson, Mr. W. P. P. O. Box 1939, University, Ala. 

Wolfenden, Mr. Hugh H. 182 Rosedale Heights Drive, Toronto 5, Canada. 

Wright, Prof. Sewall. Dept, of Zoology, Univ. of Chicago, Chicago, III.. 

Wyckoff, Mr. J. F. Dept, of Math., Trinity College, Hartford, Conn. 

Zoch, Mr. Richmond T. Cosmos Club, 1520 H. St,, Washington, D. C. 

Zubin, Dr. Joseph. 722 W. 168th St., New York, N. Y. 




LIMITING DISTRIBUTIONS OF QUADRATIC AND BILINEAR FORMS 

By William G. Madow 


i,» 


1. Introduction. In a previous paper [15], several generalizations of the 
theorem of Fisher, [6, p. 97] and Cochran, [2, p. 178] on the joint distribution of 
quadratic forms in normally and independently distributed random variables 
were derived. The chief purpose of this paper is a demonstration that the 
Fisher-Cochran theorem and its generalizations are valid in the limit under con¬ 
ditions completely analogous to those under which the Laplace-Liapounoff 
thedrem holds. Applications to the analysis of variance, periodogram analysis 
and multivariate analysis are discussed. 

Our general procedure will be to find algebraic conditions on the matrices of 
quadratic and bilinear forms which enable us to assert that the limiting distribu¬ 
tions of these forms are those which they would have had if the variables, the 
squares or products of which appear in their canonical forms, had been normally 
and independently distributed.* One thing which makes this possible is the 
fact that many frequently used quadratic and bilinear forms have the same 
rank no matter what may be the number of variables of which they are func¬ 
tions. For example, the rank of the square of the arithmetic mean, £», where 

£n *■ ~ (*1 + * • • + *»), 

ft 


is one for all values of n. In this case the quadratic form, 


1 n 
Tr 


XftXp f 


is a function of the n variables *i, x *, • • • , x n . 

In paragraph 2 we state the vector form of the Laplace-Liapounoff theorem 
and several corollaries. The joint limiting distributions of quadratic and 
bilinear forms are derived in paragraph 3. The final paragraph is devoted to a 
statement of a few applications of the theorems. 


1 Much of this research was done under a grant-in-aid from the Carnegie Corporation of 
New York. 

• The material contained in this paper was presented in part to the American Statistical 
Association, December 28, 1937, and in ptfrt to the Institute of Mathematical Statistics, 
December 27, 1938. 

1 We shall be chiefly concerned with conditions under which the limiting distributions 
are not themselves normal. If the limiting distributions are normal, then generally under 
the conditions we state, the Laplace-Liapounoff theorem will have been directly applicable. 

126 



126 


WILLIAM 0. MADOW 


2. The Laplace-Liapounoff theorem. 4 We shall first state some definitions 
and terminology which will be used throughout the paper. 

If used as subscripts or superscripts, or as indices of summation or multiplica¬ 
tion, the letters i, j will take on all integral values from 1 through p, the letters 
M, v will take on all integral values from 1 through n, the letters y, i will take on 
all integral values from 1 through m, the letter a will take on all integral values 
from 1 through k, and the letter 0 will take on all integral values from 1 through 
k- 1, unless explicit statement to the contrary is made. 

The totality of all sets of v real numbers will be denoted by R". Thus FC is 
the combinatory product of the spaces R 1 , R 1 , ■ • • , R 1 , (v times). 

If *i, • • • , x n are random variables, and if A is a proposition concerning 

, ••• ,x», then by P{A} we shall mean “the probability that A.” The 
distribution function of the random variables x \, • • • ,x n will be denoted by 
F(x 1 , ,Xn), i.e. 

F(xl, ... ,x°n) = P{x ! < x\,... ,x n < *°„} 

for all sets of n real numbers. Thus F will have an operational meaning in 
this paper. 

If A(x \, • * - , x n ) is a function of X \, • • • , x n defined on R n and measurable 6 
with respect to F(x i, ... , ar ft ), then E{A(x x , ... , x n )\ will be defined by the 
equation, 

E{A(x 1 , • • . , «n)j « / A(zi, • • . , X n ) dF(Xi , . . . , Xn), 

Jr* 

where the integral is a Lebesgue-Stieltjes or Radon integral. Hence 
| A(xi , •. • , x n ) | is assumed to be integrable with respect to F(x x , • - • , x n ). 

If Q(yi , • • 4 , y P ) is a single valued measurable function of y\, ... , y p on 
R p , and if is a real single valued Borel measurable 6 function of Xi , 444 ,x n 
on R n , then upon substituting for y x , •.. , y p it is seen that Q{yi, • • • , y p ) 

4 Although the theoreraa will be stated in terms of probability distributions, Borel 
measurability, and Lebesgue-Stieltjes integrability, it may simplify the reading if the 
words “probability distributions” are replaced by probability densities or statistical 
distributions, “Borel measurability” are replaced by continuity, and “Lebesgue-Stieltjes 
integrability” are replaced by Riemann integrability. 

* A function , ... , x%) defined on R n is said to be measurable with respect to a distri¬ 
bution function F(x 1 , ... , x n ) if the set E(t) of all ^ , ... , x n such that A(x x , ... , x n ) < t 

is such that / dF(x 1 is defined for all t. 

• All subsets of R n which may be formed from the totality of intervals of R % by repeated 
summations or multiplications of not more than a denumerable number of intervals of 
R n , and R n itself, constitute the totality of Borel sets of R \ The function y(x 1 , ... , x n ), 
defined on A*, is a Borel measurable function of x t , ... , x n on R* if the set of values of 
Xi , ... , x n such that y(x 1 , ... , s n ) < t is a Borel set for all t. The class of continuous 
functions is contained in the class of Borel measurable functions. For further details, 
see (3, cha. 1, 2], [11, ch. 3] and [17, chs. 1, 2, 3]. 



LIMITING DX0TEIBDTXONS 


197 

is a single-valued measurable function, A(xi, • •» , x n ) of xi , ..« r x n on jR* . 
If x% , • * • , x n are random variables, then y %, * • • , y p are random variables, 
and 7 

(2.1) Wfo, •••,*,)} -JrjAfci 

We shall call E(xt) the mean value of X {, the covariance of Xi and X/, 
and <tu or the variance of a;*, where <w = 2?((x» — Exi)(x,‘ — 

The Laplace-Liapounoff, or Central Limit theorem states conditions under 
which linear functions of random variables have a normal limiting distribution. 
The general characteristic of the proofs of the theorem is that conditions are 
placed on the random variables so that they may virtually be assumed to be 
bounded. The Lindeberg 8 condition, which we shall use, is perhaps the least 
restrictive of all the conditions which require finite means and variances. 

The Lindeberg condition 9 , £ p : A set of random variables x* n will be said to 
satisfy the Lindeberg condition £ p if there exists, for any preassigned positive 
real numbers 6 and e, a positive integer no such that if n > no, then 

iL* J Z, n dF{x i rn , ♦ • • , Xprn) ^ 5, 

* ■'|*M»|>« 

where 

Z pn = Xip n + Xj„ n + • • • + X p „ n 

and 

Ain + Ain + • • • + <7«n» ** 1. 

If 


Xip% ~ where &<n ** eu *f“ ••• “ho’in, 

Sin 

and the x* n satisfy £ p then we shall say that the x*, satisfy £ p . 

Suppose that the random variables yn , • • • , y PMp have a normal multivariate 
distribution with zero means and with covariance parameters where 

aiyn = E(y iy y if ) f y * 1, ... , m<; 5 * 1, ... , m,, 

and denote the distribution function of y n , • • •’, y pmp by N(y). Then we may 
state the Laplace-Liapounoff theorem as: 

7 It is noted that 0(yi is integrated with respect to F(y,, ... , y p ) and 

A(«i, ... , x n ) is integrated with respect to Ffa , ... , a?*). 

1 See Cramer [3, pp. 57, 60,114], and the references there given. 

9 It is not difficult to show that the Lindeberg condition will be satisfied if moments of 
order greater than two exist, [3, p. 60], or if the conditions stated by Levy (13, p. 207] 
and [14, p. 106] are satisfied. 



128 


WILLIAM G. MADOW 


Theorem I. Suppose that, for each value of n, the random variables Xiy ,*, 
which are independent for different values of v, have zero means and covariance 
parameters <r< 7 / t ,%, where 

Oiy fir* “ E(pl{yr*Xfir*). 

Denote by d* the maximum of the variances <?,>.>» . If the functions yt yn are 
defined by the equations 

y<yn ~ Zi Xiyrn, 

P 

it follows that 

Oiyfin = EiyiynVfin) = 23 Oiyfir*. 

P 

If lim ffiyjin = Viyjs and if lim d' n = 0, then a necessary and sufficient condition 
«-♦ 00 »-*•0 

that as n —* », the limiting distribution 10 of y n „ , ■ ■ ■ , y P m p n be N(y) is that the 
condition $t pmf be satisfied. 

The proof of this theorem is omitted. It may readily be developed from the 
proofs of Cramer, [3, pp. 57, 113]. 

Before stating certain corollaries which are of interest, some additional 
definitions are necessary. 

Let C „ , C*+i , • • • be a sequence of m rowed real matrices 

C* = || Cy,„ ||, n = m,m + 1, • • • , 

and let the greatest of the absolute values of the elements of C„ be denoted by 
d* . The inner product of any two rows of C* will be denoted by p 7 j„ , i.e. 

Pyin — 23 Cym Cjm • 

¥ 

Let Xi,Xt, • • • be a sequence of random vectors of p components defined 
on R p , and let the components of X* be denoted by x ip , • • • , . Let the 

components of the chance matrix Y„ = 11 y iy „ | | which has p rows and m columns, 
be defined by the equations 

(2.2) Piyn = 23 Cy,*Xi, 

P 

for each value of n, (n — m, - • • ; m > p). 

19 The distribution functions F(X n ) will be said to converge to the distribution function 
F{X) if and only if ; 

lim [ B dF(Xn) - F(X) 

JL» 

for every X at which F(X) is continuous. If F(X) is continuous throughout R*, then the 
convergence is uniform. 



LOOTING DISTRIBUTIONS 


m 


Suppose that 

(2.3) E{xi.) - 0 
and 

(2.4) E(xi*xjp) ® Gtfipt , 

where 6 P , = 1 if n = v and 8*, ■» 0 if u 9* v. (There should be no confusion of 
this use of the letter 6 with its use as an index.) It is easy to see that if the 
Cyp„ are real numbers, then 

E(Viyn) = 0 

and 

EiViynVjln) ~ *<iPy l» • 

Let the determinant of the positive definite symmetric matrix, (a) = || on 1| 
be denoted by a. Let the inverse matrix of (<r) be denoted by (<r) _1 = || o'* || 
where o'* is the cofactor of on in (o) divided by o. The determinant of (o') -1 
is o~ l . 

By Na(xi, • • • ,x p ; (o)) we shall mean the normal probability density with 
zero means and covariance parameters on , i.e., 

N d (x i, • • •, x p ; (<r)) = (2 tv) - ' exp [-J ]£ <r°x,x,], (- » < x< < oo), 

where (o) is a positive definite matrix. If the random variables X\ , • • • , x p 
have probability density Nj(X ; (o)) ® Nd(x i, • • •, x,; («■)), where X is a vector, 
then we shall say that X has a distribution function N(X; (o)), i.e. 

d ” N(X; (o)) = Nd(X; (o)) 

• • • dXp 

or 


/ *p r 

• • • / N d(t\ t 

00 J—oo 


t p ;(o))dt l ...dt p = N(X;(o)). 


Inasmuch as certain hypotheses will be used on several occasions in this 
paper, they are stated here. 

If *i, xj, • • • are independently distributed, if (2.3) and (2.4) hold and if 
the x’s satisfy the condition £ p then we shall say that tfC p is true. 

If C n is such that, for all n, the equations p Y i„ = 6 r t are true, we shall say 
that € is true. 

The following corollary is useful in deriving limiting distributions in the 
analysis of variance. 

Corrollary I. Let Xp and, C be true. Then a sufficient condition that 

lim F(Y P ) - II N(yi Jt GO) 

*s lim d n = 0. 



130 


WILLIAM G. MADOW 


The proof is based on the fact that the z< 7 ,„ of Theorem I are given by Cy,»Zi ,. 
The details are omitted. 

The pm rowed square matrix, (r) = || r n . || is defined as follows: If r <. m, 
s < m; then r n = <r»ip„ ; and if km < r < (k + l)m, Im < a < (l + l)m, 
l, k = 0, • • • , p — 1, then r„ = <r*+i j + ip r -*« ,-u. The inverse matrix of 
(r), and the determinants of (r) and (r) -1 are defined as are (<r)~ l , v and <r~ l . 
Corollary II. Let DC, be true, and let 

lim Py In — Pyi, pyy = 1. 
n-voo 

Then, if lim d n — 0, it follows that 

»«+ao 

lim F(Y n ) = F(Y), 

where F(Y) is the distribution function determined by the probability density 

(2ir)“ r r"* exp r r 'j/*+i r -tm l/i+i .-imj 

where, if r < m ,« < m, then k = 0, l = 0; if r < m, m < s < 2m, then k = 0, 
1=1; and so on. 

The proof is omitted. 

If Zi, • • • , Zt are random variables, then F{Xi, • • • , X* | Zi , • • • , Zt) is 

the distribution function of the random vectors A”i, ... , Xk for fixed values of 

Zi, '• • • ,Z t , i.e. for any fixed values of Zi , • • • , Zt, 

P{X, < X ,, • • • , X k < X*} = P(X,, • • • , X* | Z,, • • • , Zt). 

We Bhall now assume that the elements c 7 ,„ of the matrix C» are Borel measur¬ 
able functions of a set of random variables 11 Z x , • ••, Z ( „. Then the matrix 
C n may be called a random matrix defined on a space W n which is the combina¬ 
tory product of the spaces on which Z\, • • • , Z tn are defined. If, for each value 
of », and for all X" and Z", the equation 

(2.5) F(X n , ZT) = P(Z"). II F(X, | Z") 

* 

is satisfied, then we shall say that 8 is true. It, is obvious that sufficient condi¬ 
tions for the truth of 8 are 

F(X", Z") = F(Z"). II F(X r ) 

9 

or, if t n > n 

F(X n , ZT) = F(Z»+i,.. • ZJ-n F(X„ Z.) 

11 The symbol X* will stand for the set of variables X \, *.. , X n , and the symbol Z % 
will stand for the set of variables Z\, ... , Zi* . 



LIMITING DISTRIBUTIONS 




or, tf In < » 

nr, = n nx„ z,)- ft f(x,). 

*•"1 

Inasmuch as we shall often use Fubini’s theorem, it is now stated here.” 

Theorem II. Let the distribution function of X n , Z" be F(X n , Z n ), let the 
distribution function of X" for fixed values of Z n be F(X n | Z"), and let the distribu¬ 
tion function of Z n be F(Z n ). Then if A(X n , Z") is measurable with respect to 
F(X n , Z") and if 


it follows that 


\A(X n ,Z n )\dF(X H ,Z n ) < oo, 

'K”'XWn 

[ \^X n ,Z K )\dF{X"\Z n ) < oo 

J Rpn 


for almost aU u sets of values of Z n and 

[ A(X n ,Z n )dF(X n ,Z n ) = [ I "[ A(r,Z n )dF(X n \Z n )]dF(Z n ). 
J KP*X* , » Uh** J 


In Corollary I an important condition was that the maximum of the absolute 
values of the elements of C n should approach zero as n increased. In order to 
obtain a similar condition when the elements of C n are random variables, we 
shall define the function d(C n ) as follows: For each value of Z n let d(C n ) be the 
maximum of the absolute values of the elements of C n . We shall 1 denote 
d(C») by d n . If the elements of C„ are Borel measurable functions then d n is a 
Borel measurable function of Z*. Hence d n is a random variable defined on W n . 

A sequence of random variables dt ,dt, • • • is said to converge in probability 
to zero if, given c > 0, then 

lira P{|d» | > «} - 0. 

n-*oo 

If the sequence of functions d p , dp+i , • * converges in probability to zero we 
shall say that % is true. 

If 3 is true, and if, for almost all values of Z n we have 
(2.8) / Xi,dF(X, f Z n ) - 0, 

(2.7) f Xj P Xj,dF(X, f Z*) « an, 

Jrp 


11 Proofs of Fubini’s theorem with the required amount of generality will be found in 
[5, p. 101] and (14, p. 73]. 

11 A proposition concerning random variables is said to be true for almost all values of 
the variables, if it is true for all values of the variables, except perhaps for a set of proba¬ 
bility zero with respect to the distribution function of the random variables. 



132 


WILLIAM G. MADOW 


and the condition £ p is satisfied with respect to the X and the distribution func¬ 
tions F(X ,, Z”) then we shall say that 3C* is true. 

If 

(2.8) 23 f Cj,nCi n Xi,Xf,dF(X,, F) = (fijSyt , 

> •'•"XU'ii 

then we shall say that 6° is true. It is noted that if Sf and (2.7) are true, then 
(? # is true if 6 is true for almost all sets of fixed values of Z n . 

Corollary III. Let <? # , $ and tfC° p be true. Then, if % is true, it follows that 

lim F(Y n ) = II N(yiy, ,y py ; (<r)). 

n-*«o y 


Proof. It is necessary to show that the condition i? pm is satisfied by the 
variables c y9n Xip if the condition Jl p is satisfied by the variables x* and that the 
condition % implies that lim d n = 0 when the of Theorem I are set equal 

rt~*«0 

to the c ytn Xi, of Corollary III. 

If we let A* tn = 23 (c T ,„x,>) s , A* = 23 Aj„ and let s\ = E (At), then, by (2.8), 

7 »< * 

it is true that 

s« “ 23 ~ ^ 23 an. 

r.< i 

From 31 p and the fact that for sufficiently large n, | d\ ( Z n ) | < 1 for almost all 
Z n we have for any preassigned e and 5, 


T ( A \dF(X n , ZT) < 4 23 f tnd\(Z K ) 23 x if dF{X„ Z n ) < 6 

8 n *&%>*•% ** p J & n >«** i 


for sufficiently large n, since the set of x’& and Z n for which > ts n con- 

i,p 

tains almost all the x f & and Z n for which A„ > ts n . Hence, the condition 
£ pm is satisfied by the random variables c 7 , n z** with respect to the distribution 
functions F(X, , Z n ). 

We now show that 

lim [max SKc^nX,-,) 8 }] = 0. 

w-*oe 

It is clearly true that 

EUc^Xi,)*] < [ d\x\,dF(X,, Z”). 


Since d n converges in probability to zero, and since eft < 1 for almost all Z, 
we can, for any e > 0, take no so large that if n > «o , then P{eft > i*} < if. 
If E is the set on which <ft > ie, we then have for all n > no , using (2.7), 


E{(c ym x i ,) i } £ /.[/., x\,dF{X,\Z n )^dF{r) 

+ *2 J w [ /„ dF < X ’ 12 ")] dF(F) £ W n 


and this inequality is also satisfied for all n > no . 



LIMITING D M T MB t mO NB . 


133 


The following discussion is useful in obtaining the limiting distributions of 
statistics which occur in multivariate statistical analysis. 

The letter / will assume all integral values from 1 through a, the letters n, v 
will assume all integral values from 1 through n/, and the letters y, 4 will assume 
all integral values from 1 through nt/, for any/. 

Let X{ , ■ • ■ be, for any fixed /, a sequence of random vectors of p/ compo¬ 
nents defined on R p/ , and let the set of random variables X{ , • • • be independently 
distributed for any fixed /. 

If, for each set of values of »i, • • . , n, , (t» is a function of ri \, • • • , n,), 


fUi,• • •,zj 


UUFiXilZ,,.. 

f > 


, Z ln )-F(Zi, • • •, Z t „), 


we shall say that &„ is true. 

Let, for any fixed value of /, the matrix 14 C f n = 11 c^, n 11 where the c^,„ are Borel 
measurable functions of X*, (fc < /), and 16 Z", have the same properties as 
C n , and let d(C{) be the same function of C f n that d(C n ) is of C n . We shall 
denote d(C{) by d f n . 

Let 

¥ 

and let yi = || y(,,n ||. 

For fixed/, the p/ rowed square matrix (a/), its inverse, and so on are defined 
as were the same functions of the an earlier in this paragraph but with am 
replacing an , where 

E[xi,\ = 0 
and 

E{x{,a/j,} = an/. 

If is true, and if for almost all values of Z" we have 


(2.9) 

/ xt'dFiX'.Z*) 

(2.10) 

[ xWi.dFiX', Z") 

JrP/ 


and the condition is satisfied with respect to the X{ and the distribution 
functions F(X{ , Z") then we shall say that DCp, is true. 


If 


(2.11) Z / cirnKipZipdFiXi, Z n ) = Vijf&yi, 


u The superscripts / and k will not indicate multiplication but will only be indices. 
M See footnote 11. 



134 


WILLIAM G. iODOf 


then we shall say that 6 f is true. It is noted that if 3/ and (2.10) are true 
then is true if <? is true for almost all sets of fixed values of Xl, • • • , X, -1 , 
Z\ 

If converges in probability to aero as n increases we shall say that Z/ is 
true. 

Cobollary IV. Let <?*, 3, and DC , • ■ •, DC‘ P , be true. Then, if Si, • • •, %. 
are true, it follows that 

lim F(n i ,...,n.)-n*’(F / ), 

where 

F(Y') = II N (yiT> ;(e/)). 

T 

The proof is almost identical with the proof of Corollary III of which this 
corollary is an extension. 

It is remarked that if the statistics, the limiting distributions of which are 
desired, are associated with the normal distribution, as are most statistics 
studied, then Corollary IV may not be the best tool to use. This is a conse¬ 
quence of the fact that such statistics are generally expressible as functions of 
uncorrelated random variables and hence are more simply discussed, using 
Corollary I. 

3. Limiting distributions of quadratic and bilinear forms. We first, assume 
the coefficients of the forms to be constants. For each set of values of i, j, and 
n, the matrix of the bilinear form with coefficients which are real numbers, 

(3.1) bij ~ ) 1 a p mXi p Xj, , 

M.* 

will be denoted by A « , and the rank of A . will be denoted by m. The maximum 
of the absolute values of the elements of A n will be denoted by b n . We shall 
assume that there exists an orthogonal transformation, 

(3.2) Jfipn ® Cpm Xiv, 

9 

of xa , • • • , Xin such that 

(3.3) b*j = 2C hiViinViln, 

where the coefficients are non-negative. 1 ' 

Lemma I. If d H is the maximum of the absolute values of the elements c ptn 
then a necessary and sufficient condition that lim 6» «= 0 is lim d„ « 0. 

u Our theorems will not be applicable if some of the Xj are negative and eorae are positive. 
However if all the Xi are non-positive then the theorems will remain true. 



LIMITING) DISTRIBUTIONS 


Proof: From (3.1) it follows that f 

Opm ~ llftfmfilni- 
I 

Hence, 6* > a^ n > X*C|„n and | a hrn | < d\ (X) Xj). The remainder of the proof 
is obvious. 

The following theorem will be the basis for a large sample analogue of Wis- 
hart’s distribution. 

Theorem III. Let be true. Then, a sufficient condition that 

lim F(Y n ) - II N(vn , • • • , Vm J W), 

»-**• y 

where b*j - £ htyunVH* is lim b„ = 0 . 

6 «~*oo 

Proof. According to Lemma I, the fact that lim b n «* 0, implies that 

n-*eo 

lim d n = 0. The y<,» are such that (? is true. Hence the hypotheses of Corol- 

lary I are satisfied and the theorem is proved. 

Before stating the corollary to Theorem III, we shall prove an obvious lemma 
which is of constant service. 

Lemma II. Let lim F{X„) = F(X) at all points of continuity of F(X), and let 

n-sso 

Oin — 0l{Xln > ••• I Xpn), • • • , Jl*n — ff*(*In , * * • , X pn ) 

be Borel measurable functions of their indicated variables for each value of n, 
(P ^ fc), defined on R p . 

Then 

lim F(gm , • • •, ff*„) = F{g x , • • • ,g k ) 

at all points of continuity of F{g x , • • • , g*), where g* = g a (x % , • • • , x P ). 

Proof. By (2.1), we have 

(3.4) E[e^ ta ' •' > ** , - ) ] = EW?'*"] , 

where since ju(xi, • • • , x P ) is a Borel measurable function of xi, • • • , x p we 
know that g u , • • • , 0 *„ have a joint distribution function F(g in , • • • , 0 *„). 
Then, since lim F(X n ) = F(X) at all points of continuity of F{X) we have 17 

n-*«o 

lim •••.«»)] 

uniformly in every t x , • • • , t p interval since 

| E[e < » ,atmlat *''"*»>]- F(e < 5 w * 11 '‘‘j | 

< / | dFpiXx, ..., X P ) - F(X i, 

17 See Cramer, [8, p. 80] and “Additional Note” at the end of the book. 



136 


WILLIAM G. MAD0W 


where F n (Xi , ■ , X p ) stands for F(X i« , • • •, X pK ), when Xt and Xt» have 

the same numerical values. If follows from (3.4), that 

lim Ele*? 1 -*”] = 

uniformly in every k, ■ • • ,t p interval, and consequently 
lim F(gi n , .•. , g kn ) - F(g i, • • •, g*) 

at all points of continuity of F(gi , • • • , g*). 

The real valued function Gd(x; n, c) will be defined by the equations 

G»(0; 0, c) => 1, (— oo < c < «), 

Gd{x ; n, e) — [r(Jn)] _1 (2c)~ , "x , "~ 1 exp £ — , (0 < x < <*>; c > 0; n > 0), 

and Gd(x; n, c) »* 0 otherwise. The function G(x; n, c ) will be defined by the 
equation 

G(x; n, c) = f G d (t; n,c)dt. 

Jo 

The real valued function Gd(xu , , • • • , x pp ; n, (<r)) will be defined by the 

equations 

1, to) - 1 

Gdixu, ...,*«;»;(»)) - (2rr‘^%-‘“.[n ri(n-t+l)] H .|z | ,(n “ p+1) " 1 

• exp l— i S a ''xn], (0 < xu < oo ; xu < xaXjj); (a) is positive definite, 

where | x | is the determinant | xu | and Gd(.x « , • • • , x vp ; n, (<r)) = 0 otherwise. 
The function G(x n , • • • , x 9P ; n, (<r)) will be defined by the equation 

/ *pp 

* ” Jl > ” * > tpp > n > ( ff )) dtudtu • • • dtpp . 

We can now state the limiting distribution analogue of Wishart’s distribution. 
Corollary V. If UC P is true, if Xt — 1, and if m > p then 

lim F(bu , bit, • • •, b* p ) = G(bu , • • • , b pp ; m, (a)). 

Proof. The conditions of Theorem III and Lemma II are satisfied. 
Obviously for fixed t, the limiting distribution of b”< is G(b; m, <r«), and if 
i j, the limiting distribution of 6"*/m is the distribution of the covariance of 
Xi and Xj in a sample of m independent pairs of observations. 11 


u See Wiahart and Bartlett, [1, p. 386]. 



uiimzra distributions 


187 

We proceed to the analogue for limiting distributions of one of our generalisa¬ 
tions of the Fisher-Coehran theorem. It is first desirable to give some addi¬ 
tional definitions. 

We consider the bilinear forms 

(3.5) bijt, = £ a% 

with real coefficients, and we denote the matrix of bf ia by Al . The rank of 
At is inf, and the rank of .4* is m*,. If the maximum of the absolute values 
of the elements of A\ , , A« -1 is b n , and if there exists an orthogonal trans¬ 

formation, 

(3.6) !/{pn m ) . Cpm ®lr t 

9 

of *<!»•••, xtn such that 

b*j a — 23 

where 6 assumes all integral values from mj + • • • + m a -i + 1 through 

»»i + • • • + m a and A* is non-negative, then it is easy to prove, as in Lemma I, 

that a necessary and sufficient condition that lim 6, = 0 is lim d H — 0, where 

*-♦00 *-♦« 

d n is the maximum of the absolute values of the elements c„, B . 

Lemma III. Let m = mj + • • • -f m*_i and let 

(3.7) 23 bija = 23 XipXf ,. 

a w 

Then, a necessary and sufficient condition that 

btja * 23 Vtt»ya%, 

where the real linear functions, , of Xa , , xu are given by (3.6), the linear 

functions (3.6) not now being assumed to be orthogonal, is 

mt* = n — m. 

Furthermore, the functions (3.6) are orthogonal. 

The proof of this lemma for the case p = 1 is given in [16]. The procedure 
to follow in extending the lemma to the cases where p > 1, is given in [15, p. 
473]. It is noted that this lemma is more general than the lemma in [15] 
inasmuch we we show that the orthogonality of the transformation is a conse¬ 
quence of our hypotheses and not one of- the hypotheses. 19 


l * It is noted, however, that the increase in generality affeots only the necessity not 
the sufficiency of the theorem. 



138 


WILLIAM G. MADOW 


Theorem IV. Let 3t p , (3.7) and (3.8) be true for aU values of n, and suppose 
that lim 6, « 0. Then 

n-* to 

lim F(y n ) = II N(yiy, ,Vry'> M), 

ft-*oo y 


where b< ^ ^ P/i» • 

The proof is omitted. 

Corollary VI. If the hypotheses of Theorem, IV are assumed, and ifmp>p ; 

(0 =* 1, • • • , A; A < ifc), tAe« 


lim F(blll , • • • , , J/u+ln , • * • , J/pmn) 

A m 

■ II , • • • , l> ppy ; m y , (a)) • n Niviy, • • • > Vry ! (»))• 

r-i t-*+I 

If p = 1 in Theorem IV and Corollary VI, we have the large sample analogue 
of the Fisher-Cochran theorem. 

We now discuss limiting distributions of random variables which are bilinear 
and quadratic forms in one set of chance variables for fixed values of other ran¬ 
dom variables. We consider the coefficients a^* and a“,« of 6*,- and 6“,« to be 
random variables. Hence the matrices A „ and A “ are random matrices. 

To be more explicit, let X{ , X{ , ■ ■ • be a sequence of random vectors, the 
random vector X f n having p/ components x{„ , • • • , a^ /n , and being defined on 
R Pf . The set of random vectors X s , and Z x , • • • , Zt„ will be assumed to be 
independent. 

For each value of / the coefficients of the bilinear forms 

(3.9) &</«/ = , (if j — 1, • • •, Pf ; a — 1, • • • , A/) 

will be assumed to be Borel measurable functions of the random vectors 
Xl, ••• .Xj-'andZi, ... ,Z tH . 

The matrix of 6"/,/ is denoted by Al l t . The rank of A p/ is and the rank 
of A k Jj is Mk/nf for all sets of values of the a^ a / except, perhaps, on a set E K/ 

which is such that lim P(E n/ ) — 0. 

*/-*« 

Let the function b(A7 f ) be defined as follows: 

For each set of values of the X p and Z let b(A P/ ) be the maximum of the abso¬ 
lute values of the elements of A f f f . We shall denote b(A Pf ) by \?l ,. Obviously, 
htff is a Borel measurable function of and Z. Hence 

- b(A") 

is a random variable defined on W X fl" 1 ' ,I+ ' +n ‘ p '. 



LnttWKG 


m 


For each value of f, and for almost all sets of fixed values of the Xj, (h *= 
1, • • • , / — 1), we shall assume that there exists an orthogonal transformation, 

(8.10) *'£»»/*<>) 

r 

of xfi, ..., Zi», such that 10 

( 3 . 11 ) &?/«/ * l 

where X assumes all integral values from mu .+ ••• + »»«-1 / + 1 through 
mu + • • • + m<u . The coefficients of the linear forms (3.10) are real 
single valued Borel measurable functions of the coefficients a£,«/ of the bilinear 
forms (3.9) for fixed values of the Xj and Z”. Let (!?,%, be the same function 
of the functions a£, a / that c£, n/ is of the coefficients of the bilinear formB having 
constant coefficients. Furthermore, let di, be the same function of the matrix 
C f n , = || <i, n{ || where m - m xl + • • • + m k/ -i / , that K f , is of A *,. 

Lemma IV. A necessary and sufficient condition that hi, converge in probability 
to zero as n increases is that d f n/ converge in probability to zero as n increases. 
Phoof. Since 

kf-l 

a *lpi ~ 1] ai/m/Cln,, 


we have 


and 


kf-l 


(*/ - 1 )bi, >Ia^> [<£,»/ 


ML, | < {£ [cCvf-Z [cL/l* < m af [di,}\ 

where X assumes all integral values from mu + • • • + m«_i / + 1 through 
mu + • • • + m„f . The remainder of the proof is obvious. 

In proving Theorem V we shall use a generalization of Lemma III which is 
proved in [15, p. 473]. 

Theorem V. Let 3CJ,, • • • 3C,, be true, and suppose that 

n/ 

^/«/ “ X •Ef*®/** 

a 

Then, if b f n/ converges in probability to zero as n increases and if m/ — n/ — m*,*, 
for aU values of n/ , it follows that 

lim F(y\xm , • • •, Vv.m .*.) = II ft N(y{y, ..., yf Pn ; (</)). 

The proof is omitted. 


f * It » not necessary that the >4 he set equal to one as in (3.11). It is only somewhat 
easier to state the results. 



140 


WILLtAfc 6. MADOW 


Corollary VII, If wt«/ > P / , then 

lim F(WAi, • • •, bp 9 tPt k 9 ~i «) =* II ll QQ>iw, • • •, b PfP/ fif; (*0). 

m.*• *.*#-*«o / i**i 

The proof is omitted. 

Finally, let us assume that the vectors X {, for fixed v are uncorrelated and 
for fixed / are independent. By that, we shall mean that EiXiprfj,) * ^6/ g 
and that for all n the set of random vectors X f p are independent for the same or 
different superscripts providing the subscripts are all different. Let us also 
assume that the coefficients of the forms (3.9) are real numbers. Thus we have 
weakened the hypotheses of Theorem V concerning the random vectors, and we 
have strengthened the hypotheses of Theorem V concerning the forms (3.9). 
Inasmuch as we are generally concerned with the limiting distributions of 
statistics which occur in the analysis of the normal distribution, and many such 
statistics have been shown to be invariant under transformations into uncor¬ 
related random variables, 21 Theorem VI and Corollary VIII will often be 
applicable. 

Theorem VI. The statement of Theorem V is repeated . 

Corollary VIII. The statement of Corollary VII is repeated . 

Another extension of these theorems may be obtained by allowing all the 
w/ to be equal, i.e. ni = • • • = n 9 = n, and by putting conditions on the forms 
(3.9) which enable us to say that for fixed i, /, /x and n, the set of random variables 
are independently distributed. Theorem I could then be used to obtain 
a very general result. However, except for the case dealt with above, the con¬ 
dition of independence appears to be rather restrictive, and the theorem is 
omitted. 

4. Applications. We first state the strong law of large numbers and a 
lemma which is very useful in the discussion of limiting distributions. 

A sequence of random variables X \, • • • will be said to converge with prob¬ 
ability one 22 to a random variable X if 

limP{|Xn-X| < €, | Xn+1 X | < €, . . . , | X n+p - X | < 6} = 1 
»-♦« 

for every value of p > 0, uniformly in p for every positive number «. Upon 
setting p — 1, it is seen that convergence with probability one implies con¬ 
vergence in probability. 

The strong law of large numbers 2 ' asserts that if the independent random 
variables X, X\, • • • all have the same distribution function, and if E(X) is 

finite, then the sequence of arithmetic means - 2 X, converges with proba- 

n p 

bility one to E(X). 

ft The regression transformation which yields the uncorrelated variables will be found 
in [15, p. 476, (3.2)]. 

“ See Doob (4, p. 163J, and Frechet, [9, p. 228]. 

** See Doob [4, p. 163], and Frechet, [9, p. 259]. A complete proof is given by Frechet. 




LIMITING DISTEIBtmGKS IK 

Hence, if E{x<,) = 0 and if <r< ; - is finite, then - £ x,>x/, *= converges with 

fl 

probability one to *<,•. Since 23 (*<• — £<»)(*/» — $,-•) «= £ x*®* — niijtj% 

* » 

where £<» is the arithmetic mean of x a , - • ■ , , and since £,-» converges with 

probability one to sero, it follows that = s\ ,« — £<»£;„ converges with 
probability one to . It is, of course, assumed that the random variables 
Xi,, x,> have the same joint distribution function for all values of v, and that 
the random vectors Xi, • • • are independently distributed. The process of the 
reduction of «,-,•» to «<,„ in the limit, is an example of the possible uses of: 

Leuma V. If <p(t\ , • •• ,t p ) is a continuous function of U , • ■ •, l P , and if the 
sequence of random variables *,•„ converges in probability, (with probability one ) to 
Xi which may be a random variable or a constant, then the sequence of random 
variables <p(xu, • • • , x Pn ) converges in probability (with probability one) to 
<p(h, • ■ •, x p ), where some or all of the x’s may be constants. If x j, • • •, x„ are 
constants then <p(h , • • • , t p ) need only be continuous in the neighborhood of 
Xi, • • • , x P and Borel measurable. 

For a proof of part of this lemma which may be extended to yield the entire 
proof, see, Frechet, [9, p. 178]. 

Using Lemma V it is easy to see that the coefficients r„ of least squares 
equations converge with probability one to their 0 values, where the 0 value 
is obtained by substituting <r<, for s< jn in the expression for r„ assuming, of 
course, independent random vectors which have the same distribution functions. 

Since problems in the analysis of variance may be interpreted as problems in 
least squares the above comments and Lemma V will generally make it possible, 
when determining limiting distributions, to consider the statistics to be func¬ 
tions of deviations from “true” mean functions rather than “sample” mean 
functions. 

We shall discuss, briefly, four applications of these results. 

(a). The limiting distribution of the regression coefficient. Let r„ , the “sample” 
regression coefficient, be defined by the equation 

23 

r„ = 


where x» and x,-, are deviations from arithmetic means. If the random vectors 
(xt,, Xj,) are independently distributed for fixed i, j, with the same distribution 
functions, and if 2?(x<,) = iE(x,>) = 0, F(x<^r„) = <r<, , then it follows from the 
strong law of large numbers that £ x*x ,>/n converges to <r<, with probability 

9 

one, and from the Laplace-Liapounoff theorem that £ x^ Jr /Vn has a normal 


limiting distribution with mean <r<y and variance 2?{x<»x,> — *<,)*}. Hence, by 
Lemma V, y/n (*- 2 ) has a normal limiting distribution with mean sero 

and variance lim E{n[r u — — ) > unless that limit does not exist. 

*-*• 1, \ v«/ ) ' 



142 


WILLIAM G. MADOW 


If the ti, ftffi not random variables then, in order to apply Corollary I with 
p ** 1, it is necessary that 

Xim 


( 4 . 1 ) 


~.(Z4)*- a 


In that case, the limiting distribution of (X **»)*•»■„ is normal with aero mean 

9 

and variance <x,,. If ( 4 . 1 ) is not satisfied then there is no assurance, unless 
the Xj, are normally distributed, that the limiting distribution of ( 2 J x*,)V» 

r 

is normal. 

(6). The limiting distribution of the analysis of variance ratio. The tests of 
significance which occur in the analysis of variance depend on the ratio of two 
quadratic forms, q in and qu, the denominator 9** having rank (or degrees of 
freedom) m% n increasing with n, and the numerator qi„ having rank m, not 
changing with n, i.e., 

_ _ ?!»«»*» 
t'H | 

qumi 

where 91* + q»» + q$« = 23 ** and q Sn is a quadratic form of rank m*„ which 

9 

will be identically zero if n = mi + m^n . Since 24 q% n is expressible as the 
variance of x about a least squares equation it follows from the previous dis¬ 
cussion and Lemma IV that — converges with probability one to a 2 under the 

assumptions that the x P are independently distributed with zero means and 
variances <r 2 . Hence the limiting distribution of v n will depend only on the 
limiting distribution of qi n and it will consequently be necessary to consider 
only the matrix of qm , in order to apply Corollary VI with p = 1. For ex¬ 
ample, 26 if there are pn independently distributed random variables Xu with 
zero means and variances <r 2 arranged in p blocks of n random variables each, 
then 

£( Xip — £)* = n 2 {fin — £»)* + iC (#*> “ An) 2 , 

i,9 i 4.9 


where x< n is the arithmetic mean of Xu , • • • , x< n and x n is the arithmetic mean 
of all the x^ . Then 

qin 3=1 n {f%n fn) , 

q%n s Z) {%i9 fin ) f 


mi = p — 1, 

mtn « p(n - 1) 


u This has been proved by Kolodziejczyk, [12, p. 161]. 
M Other schemes are given in Fisher, [8]. 



LIMITING DISTRIBUTIONS 


143 


and the matrix of gi„ may be obtained by substituting for the in and . In 

this case it is sufficient to express gi„ as £ aaSiSj where St ■» ** > ®« “ 

*•1 * 

(p — 1)/?®) &nd, f, i j, an = — 1 /pn, to see that the condition that the 
maximum of the absolute values of the elements of the matrix of gi„ approaches 
zero as n increases. Hence, if the Xi„ satisfy the condition i£, the limiting 
distribution of m\v n is G(v; p — 1, 1). 

Clearly, if only the rank of gi„ increases as n increases, the rank m*» of gt» 
being constant and if the maximum of the absolute values of the elements of 
the matrix of g»„ also approaches zero as n increases, then v n will have a limiting 
distribution which is the analysis of variance distribution, and the limiting 

distribution of —~— will be the correlation ratio distribution. 
gi» + gin 

(c) . Periodogram analysis. We need only remark that the linear functions 
which are used in the analysis of the Schuster periodogram®* meet all the require¬ 
ments of Corollary I if the x, are independently distributed with zero means and 
constant variances and satisfy the condition £. Consequently the large sample 
theory of the Schuster periodogram is the same for non-normal as it is for 
normal distributions. 

(d) . Multivariate analysis. We shall assume that the random vectors 

Xi , • • • , ( X, has components Xi ,, • • • , a>), are independently distributed, that 
(2.3) and (2.4) are satisfied, and that the condition i? p is satisfied. For any 
fixed n and a we shall call the determinant D* of the forms (3.5) a generalized 
sum of squares, and the determinant VI of the elements a generalized 

variance. We shall say that Dg and Vg have rank mg and that D* and FT 
have rank n*„ . If mg is constant, and if (3.7) and (3.8) are true then clearly 
the limiting distribution of Dg is the distribution of the generalized variance 
of mg vector observations 27 from a normal distribution, with zero means and 
covariance parameters <r <} . Under the same conditions, the limiting distri¬ 
bution of Dg/VJ! is the distribution of the generalized variance of mg vector 
observations from a normal distribution with zero means and covariance pa¬ 
rameters in . Many other similar limiting distributions are immediately 
derivable. 

Before completing our discussion of the limiting distributions of statistics 
occurring in multivariate analysis, we shall state a theorem on limiting distri¬ 
butions which is an obvious generalization of a theorem of Doob, [4, p. 166J. 

Suppose that the random variables g(n)X iH ,. • • • , g(n)X„ n have a distribution 
function F(g(n)X u , • • • , g(n)X pn ) which is such that 

lim F(g(n)X m, • • •, g(n)X„) = F(X lt X p ), 

where F(X i, • • • , X p ) is a continuous distribution function, and suppose that 
Xin converges in probability to the real number £<. For example, if £» = 

*• The theory of the Schuster periodogram is given by Fisher [7]. 

*» See Wilks, [18, p. 476] or Madow, [15, pp. 481, 484]. 



144 


WILLIAM Q. MADOW 


2 x,/n where E{x,) = 0, J£(xj) = 1, and £ is satisfied, then £„ converges to 
¥ 

zero with probability one, and y/n x n has a limiting distribution which is 
normal with zero mean and unit variance, i.e. 

lim | P[Vn£n < x) - N(X ; 1) | = 0. 

n-+oo 

Theorem VII. Let <p/(k , • • • , t„) be a function of k , ■ ■ ■ ,t p defined in a 
neighborhood N of , • • • , f p which, together with its (k; + l)-th partial deriva¬ 
tives is continuous in N. Suppose that k is the least value of rj such that the 
random variables M 

fo(n)r'[i: (x<» - 

have a joint limiting distribution function D(xi , • • * , x t ). Then the random 
variables [g(n)] kf [<pf(x in , • • • , x pn ) — 1 , • • • , £ P )] have a joint limiting distri¬ 

bution which is given by D(x\ , • • • , x,). The value k/ is greater than or equal to 
the minimum value for which not all the partial derivatives of order kj vanish at 
f i > * * • > • 

The proof is almost word for word that of Doob, the only difference being 
the removal of the specializing words. 

We now consider the limiting distribution of the ratio of generalized sums of 
squares L n which is defined by 



where D*+i is the determinant of the forms 6 ?,-* + 6 ?/i = *+i. It has been 

shown that 29 


l = TT JjL 

n Vi7h.’ 

where F”, , (j = k, k + 1 ), is a ratio of generalized sums of squares 


n = 


_ IK 


IKwl’ 


(r, 8 — 1 , • • •, i; u, v = 1 , • • •, i — 1 ; 600 / = 1 ). 


Since F?y/m, n converges with the probability one to | <r w |/| <r u * |, and since, 

by Corollary VIII the joint limiting distribution of the m^i » (1 — ) is 

\ Y ih+ 1 / 


18 See Goursat-Hedrick, [10, p. 107] for a statement of the Taylor expansion of functions 
of several variables, which we use here, by ^ : - 


d<p f (zi, ... ,x p ) 




is meant the value of 


dxi 

*• See Madow, [15, p. 


at the point & , ... , £ p . 



LIMITING DISTRIBUTIONS 


145 


II G(xi ; m, 1) it follows, by Theorem VII, that the joint limiting distribution of 
the ratios of generalized sums of squares 


i v* 1 

IT 


is 

II G(xi ; imi , 1) 

i 

and that the limiting distribution of m*+j „(1 — L„) is 30 

G(x; pmi , 1). 

In a following paper, these results will be extended to quadratic forms in 
non-central random variables. 


5. Summary. In Section 2, Theorem I, we stated a very general form of the 
Laplace-Liapounoff theorem based on the Lindeberg condition. In four corol¬ 
laries, this theorem was shown to provide joint limiting distributions for sys¬ 
tems of linear forms which are such that the maximum of the absolute values 
of their coefficients converge to zero with an increase in the size of the sample 
if the coefficients are constants, and converge in probability to zero with an 
increase in the size of the sample if the coefficients are themselves random 
variables. It was shown that under certain conditions functions of several 
random variables, which are such that each function is a linear function of 
certain random variables for fixed values of random variables of lower index, 
also have a normal multivariate limiting distribution. 

These results were extended to include limiting distributions of quadratic 
and bilinear forms in Section 3. The method of extension was to show that 
necessary and sufficient conditions for the existence of systems of linear forms 
satisfying the conditions of Section 2 are provided by rather simple conditions, 
the most important of which is that the greatest of the absolute values of the 
elements of the matrices of the quadratic and bilinear forms approach zero if 
the size of the sample increases, the ranks of the forms remaining unaltered. 
This led to the theorem that quadratic and bilinear forms having such ma¬ 
trices have x 2 . or covariance, or Wishart’s distribution as limiting distributions. 
It was then shown, in Theorem IV, that if the rank of the sum of the matrices 
of the quadratic and bilinear forms is equal to the sum of the ranks of the ma¬ 
trices, and if certain of these ranks do not change as the size of the sample 
increases, then the system of quadratic and bilinear forms have Wishart’s 
distribution in the limit provided the other conditions are met. These results 


10 A generalization of Wilks’ result, [19, p. 323J to the case where the variates are not 
assumed to have a normal multivariate distribution may readily be obtained. 



146 


WILLIAM G. MADOW 


were then extended in Theorem V to one of the cases occurring when the coeffi¬ 
cients of the forms are themselves random variables. 

Several simple illustrations of the uses of the methods were given in Section 4. 
It was shown that the analysis of the variance ratios, and statistics occurring 
in the theory of multivariate statistical analysis have the same limiting distri¬ 
butions which they would have had if their variables had been normally and 
independently distributed. 


REFERENCES 

[1] M. S. Bartlett and , 1 . Wishart, “The generalized product moment distribution in a 

normal system,” Proc. (Jamb. Phil. Sac., Vol. 29 (1933), pp. 260-270. 

[2] W. G. Cochran, “The distribution of quadratic forms in a normal system, with 

applications to the analysis of covariance,” Proc. (Jamb. Phil. Soc., Vol. 30 
(1934), pp. 178-191. 

[3] H. Cramer, Random Variables and Probability Distributions,” Camb. Tracts in 

Math, and Math. Physics, No. 36, London, 1937. 

[4] J. L. Doob, “The limiting distributions of certain statistics,” Annals of Math. Stat., 

Vol. 6 (1935), pp. 160 169. 

[5] J. L. Doob, “Stochastic processes with an integral-valued parameter,” Trans. Am. 

Math. Soc ., Vol. 44 (1938), pp. 87 150. 

[6] R. A. Fisher, “Applications of ‘Student’s’ distribution,” Metron , Vol. 5 (1926), pp. 

90-104. 

[7] R. A. Fisher, “Tests of significance in harmonic analysis,” Proc. Roy. Soc., (A), 

Vol. 125 (1929), pp. 54-60. 

|8] R. A. Fisher, Statistical Methods for Research Workers, 7th ed., Oliver and Boyd, 
London, 1938. 

[9] M. Frbohet, Recherches Theoriques Modernes sur la Theorie des Probabilites , Vol. 1, 
Gauthier-Villars, Paris, 1937. 

[10J E. Goursat, A Course in Mathematical Analysis, Vol. 1, translated by K. R. Hedrick, 
Ginn and Co., New York, 1904. 

[11] A. Kolmogoroff, “Grundbegriflfe der Wahrscheinlichkeitsrechnung,” Ergebnisse der 

Mathematik, Vol. 2, no. 3. 

[12] 8. Kolodziejczyk, “On an important class of statistical hypotheses,” Biometrika, 

Vol. 27 (1935), pp. 161-190. 

[13] P. Levy, Calcul des Probabilites, Gauthier-Villars, Paris, 1925. 

[14] P. Levy, Theorie de VAddition des Variables Aleatoires, Gauthier-Villars, Paris, 1937. 

[15] W. G. Madow, “Contributions to the theory of multivariate statistical analysis,” 

Trans. Am. Math. Soc., Vol. 44 (1938), pp. 454-495. 

[16] W. G. Madow, “The distribution of quadratic forms in non-central normal random 

variables,” Annals of Math. Stat., Vol. 11 (1940), pp. 100-103. 

[17] 8. Saks, Theory of the Integral, 2nd ed., G. E. Stechert and Co., New York, 1937. 

[18] 8. 8. Wilks, “Certain generalizations in the analysis of variance,” Biometrika , Vol. 24 

(1932), pp. 472-494. 

[19] 8. 8. Wilks, “On the Independence of k sets of normally distributed statistical 

variables,” Econometrika, Vol. 3 (1935), pp. 309-326. 

[20] J. Wishart and M. 8. Bartlett, See [1]. 


Washington, D. C. 



ON A TEST WHETHER TWO SAMPLES ARE FROM THE SAME 

POPULATION 1 

By A. Wald 2 and J. Wolfowitz 

1. The Problem. 8 Let X and Y be two independent stochastic variables 
about whose cumulative distribution functions nothing is known except that 
they are continuous. Let xi , x 2 , • • • , x m be a set of m independent observa¬ 
tions on X and let yi , * • • , y n be a set of n independent observations on Y . It 
is desired to test the hypothesis (the null hypothesis) that the distribution 
functions of X and Y are identical. 

An important step in statistical theory was made when “Student” proposed 
his ratio of mean to standard deviation for a similar purpose. In the problem 
treated by “Student” the distribution functions were assumed to be of known 
(normal) form and completely specified by two parameters. It is clear that in 
the problem to be considered here the distributions cannot be specified by any 
finite number of parameters. 

It might nevertheless be argued that by virtue of the limit theorems of 
probability theory, “Student’s” ratio might be used in our problem for large 
samples. Such a procedure is open to very serious objections. The popula¬ 
tion distributions may be of such form (e.g., Cauchy distribution) that the limit 
theorems do not apply. Furthermore, the distributions of X and Y may be 
radically different and yet have the same first two moments; clearly “Student’s” 
ratio will not distinguish between two such distributions. 

The Pearson contingency coefficient is a useful test specifically designed for 
the problem we are discussing here, but one which also possesses some disad¬ 
vantages. The location of the class intervals is to a considerable extent arbi¬ 
trary. In order to use the x distribution, the numbers in each class interval 
must not be small; often this can be done only by having large class intervals, 
thus entailing a loss of information. 

2. Preliminary remarks. Denote by P\X < x) the probability of the rela¬ 
tion in braces. Let f(x ) and g{x) be the distribution functions of X and Y 
respectively; e.g., P\X < a:) = f(x). Throughout this paper we shall assume 
that /(a?) and g(x) are continuous. 

Let the set of m + n elements xi , - - • , x m and yi , . • • , y n be arranged in 

1 Presented to the Institute of Mathematical Statistics at Philadelphia, December 27, 
1930. 

* Research under a grant-in-aid from the Carnegie Corporation of New York. 

* The authors are indebted to Prof. S. S. Wilks for proposing this problem to them. 

147 



148 


A. WALD AND J. WOLFOWITZ 


ascending order of magnitude, and let the sequence be designated by Z, thus: 
Z = Zi , z 2 , • • • , z m+n , where z x < z 2 < • • • < z m+n . (f(x) and g(x) were 
assumed to be continuous. Hence the probability is 0 that z* = z<+i and there¬ 
fore we may exclude this case.) Let V = v x , v 2 , • • • , v m +n be a sequence de¬ 
fined as follows: v t * = 0 if z» is a member of the set X \, • • • , x m and Vi « 1 if z { 
is a member of the set y x , • • , y n . It is easy to show that any statistic S 
used to test the null hypothesis should be invariant under any continuous, 
reciprocally one-to-one transformation of the real axis. That is to say, if 
// = <p(t) is any such transformation, then 

(1) S(X\ , • • • , , Vl , • • • , Vn) 35 S(<p(x i), • • • , <p(x m ), *>(»,), • • • , <p(Vn))' 

The reason for this requirement on S is the fact that the transformed stochastic 
variables X' — <p(X) and Y' = <p(Y) are continuous and have identical distribu¬ 
tions if and only if X and Y have identical distributions. Hence S must be 
a function of V only, with the added restriction that S(F) = <S(F')> where 
V' = v m+ „ , , ■ ■ ■ ,Vi. For if S were a function of xi , ■ ■ ■ , x m , 

yi, • • • , y„ which cannot be expressed as a function of V alone, then there 
exists a continuous reciprocally one-to-one transformation l' — <p{t) such that 
(1) is not true. On the other hand, any continuous reciprocally one-to-one 
transformation of the entire line into itself is monotonic and hence either leaves V 
invariant or else transforms it into V'. 

3. Previous results. In an interesting paper on this problem W. R. Thompson 

[1] proceeds as follows: Let the sets Xi , • • • , x m and yj , • • • ,y n be ordered in 
ascending order of magnitude, thus: x Pl , x Pl , ■ ■ ■ , x Pm and y p \, , • • • , y^ 
where x Pl < x P1 < ■ ■ ■ < x Pm and y P [ < y P ' t < • ■ ■ < y P ', . Let P\x pk < y p > k - j 
denote the probability of the relation in braces under the null hypothesis (/(x) == 
g(x)). This probability is shown to be independent of f(x) and the relation 

(2) P{x n < y P ' t >\ = n, k, k') 

holds, where the right member, which is given explicitly by Thompson, is a 
function only of the arguments exhibited. To make a test of the null hypothesis 
with, say, a 5% level of significance, this writer proposes to choose k and k' 
so that ^(m, «, k, k') — .05. The test would then consist of noticing whether 
x Pi < y P ' h ' or not. In the former case the null hypothesis is to be considered 
as disproved. 

It is clear that this test cannot be very efficient, ignoring as it does so many 
of the relations among the observations. Except under certain rather narrow 
restrictions on the admissible alternatives, for example, that g(x) * /(x + c), 
where c is an arbitrary constant, the test suffers the further defect of not being 
“consistent” in a way which will be discussed below. Hence the test suggested 
by Thompson can scarcely be regarded as a satisfactory solution of the problem. 
This criticism, of course, does not apply to those sections of Thompson’s paper 
which deal with the question of estimating the so-called normal range. 



PROBLEM 07 TWO SAMPLES 


149 

4. The statistic U. A subsequence r,+i, r,+i, • • • , p t+r of V (where r may 
also be 1) will be called a “run” if t>,+i = r,+* = ■••*= v , +r and if v, s* »*+i 
when « > 0 and if v,+, v»+r+i when 8 + r < m + n. For example, V ** 
1, 0, 0, 1, 1, 0 contains the following runs: 1; 0, 0; 1,1; 0. The statistic 4 U 
defined as the number of runs in V seems a suitable statistic for testing the 
hypothesis that f{x) ss g(x). In the event that the latter identity holds, the 
distribution of U is independent of f(x). A difference between f(x) and g{x) 
tends to decrease V. U is consistent in a sense which will be discussed below. 
In order to derive the distribution of U under the null hypothesis, we first 

(m 1 ~.\ | 

note that all the --(= m+n C m ) possible sequences V have the same 

ml nl 

probability To see this, consider the sequence V where !>< = 0 

(i = 1 , 2, * - • , m) and v* = 1 (i = m + 1 , m + 2, • • • , m + n). Clearly the 
probability of the sequence is 

m(m —!)••• l-n(n — 1) • • • 1 
(m + n)(m + n — 1) • • • (n + 1 )n(n — 1) ... I* 

Furthermore, the probability of any other sequence is equal to the product of 
the factors in the numerator of q taken in a different order, divided by the 
product of the factors in the denominator taken in the same order. The quo¬ 
tient is, of course, = q . 

Let € 0 be the number of runs in V whose elements are 0 and let e x be the 
number of runs whose elements are 1 . Obviously U = + e \. Let the runs 

of each kind be arranged in the ascending order of the indices of the Vi . Let r 0 ,* 
be the number of elements 0 in the j th run of that kind (j = 1 , 2 , • • • , Co) and 
let r\ j> be the number of elements 1 in the j' th run of that kind (/ = 1 , 2 , • • •, ei). 
The following relations obviously hold: 


(3) 

2^ r 0> - = m, 


j-i 


• 1 

(4) 

e 

II 

£ 

A 


r-i 

(5) 

1 < eo < m, 1 < 

(6) 

| Co - «i | <1. 


4 When this paper was already in proof, our attention was called to a paper by W. L. 
Stevens, entitled “Distribution of groups in a sequence of alternatives,” Annals of Eu¬ 
genics, Vol. 9 (1939). There a statistic, which is essentially the U statistic, is proposed 
for a problem different from that considered by us and the distribution of U is obtained 
in a different manner. However, the application of the U statistic for the purpose herein 
described, the proof of consistency and the other results of our paper are not contained 
in it. 



150 


A. WALD AND J. WOLFOWITZ 


Hence if U = 2k, then eo = ei = k, and if U * 2k — 1 , then either e* =« 4, 
cj = fc — 1 or Co = k — 1 , ej = A;. The element Vi of V together with the num¬ 
bers fa , r n , • • • , fa t , r n , r lt , • • • , r Ul , completely determines the sequence V 
whose probability is q. 

Without loss of generality we may assume that m < n. If U = 2k, 
1 < k < m, vi — 0 , any two sequences of k positive numbers each may consti¬ 
tute a sequence of r M , • • • , r 0 , 0 , ru, , n,, provided only that (3) and (4) 
are satisfied. The number of sequences fa , fa ,••• , fa which satisfy (3) is 
the coefficient of a m in the purely formal expansion of 

(a + o 2 + o’ + •••)* — 

and hence is m ~ 1 C , *-i. Similarly the number of sequences rn , r t i, • • • , fa 
which satisfy (4) is found to be B ~ 1 G’*_i. Bearing in mind the case U = 2k, 
»i = 1 , we obtain 

(7) P{ V = 2k\ = , ( k = 1, 2, • •., m), 

where the left member denotes the probability of the relation in braces under 
the null hypothesis. In a similar manner we obtain 

( 8 ) P = \U = 2k - 1} = { —-t", 


(k = 2, ■ • •, m + 1), 


with the proviso that a Cb = 0 if a < b. 

We shall now briefly indicate a method of obtaining the mean E(U) and 
variance a\U) of U. For example, E(U) may be obtained by performing 
several summations of the type 

(9) 

i-0 

It is easy to verify that the expression (9) is the term free of a in the purely 
formal expansion in a of: 

(10) (m - 1).(1 + or- 2 .a.(l + ^ \ 


and hence is 

(ID 


(rn - l). m+B -*C n _*. 






PROBLEM OP TWO SAMPLES 


151 


The other summations required for the mean and variance can be carried out 
in a similar manner. We shall omit these tedious calculations. The results are: 


( 12 ) 


E(U) « 


2mn 
m + » 


1 , 


(13) 


*\U) - 


2mn(2m n — m — n) 
(m + n)*(m -f n — 1 )' 


The critical region for testing the null hypothesis on a level of significance 0 
is given by the inequality U < uo, where «* is a function of m and n such that 
P{t7 < uo) = 0. 


6. The asymptotic distribution of U. Let m/n = a, a positive constant. 
Then, as m —♦ *>, 


E(U) 

c\U) 


2m 

r+it’ 

4am 


(! + «)*' 


Theorem I. 
2 m 


U < 


1 +a 


+ 2 


If t is any real number, the. probability of the relation 
l converges uniformly in l to 


am 1 
_(1 + «?_ 


1 



e^dw 


as m —► oo. 

The proof of this theorem is essentially the same as the classical proof that 
the binomial law converges to the normal distribution (see, for example, Frfichet 
[2], p. 89) and it will be unnecessary to give the details. Since the asymptotic 
distribution of the subpopulation of even U is the same as that of odd U } it 
will be sufficient to consider only the right member of (7). Let m' = m — 1, 
n' = n — 1, and V = k — 1. We make the substitution 


z ^ . 1 + a' * , m f 

(14) w =- , where 

m 

(16) dw = — 

V tw 


and evaluate the factorials by Stirling’s formula. We shall give here only the 
results of successive simplifications. At each step we shall omit the factors 
free of k or w, since their product may be reconstructed from the final expo¬ 
nential form. Thus instead of the right member of (7) we can consider the 
expression: 

( 16 ) 


l Ck-v n ~ l Ck-i . 



152 


A. WALD AND J. W0LF0WITZ 


Omitting factors free of k, we get 

(17) (k- 1)! (m - Jfc) l (fc~l)T(n^nfcy! 
and by Stirling’s formula, since k and m are both large: 

(18) _ fc 0 (*'i + i)( n r_' ^ ) (^+P. 

Now apply (14). We obtain 


(ym>w + _j?l >^ ; - 1 - ( - vV 

•(-v® ” + „T. 


Dividing inside the parentheses by r ~ i7 - ~ — Kf respectively, 

1 + a 1 + a a (1 + a ) 

and again omitting factors free of w, we get 

( l + (L±_ a >Vv ^--L _ 

\ Vmf ) \ a'y/m' ) 

.(i - 

\ Vm' / 

w , Up 

Taking logarithms, expanding in powers of and neglecting terms in 
and higher orders, the results are 

-( 2 v ®»+ g ?+- (1 - + #*0 

- - «TO - 0 (" w ” + -^0 


which equals 


+ 0(m'~*). 


The proof of the fact that the distribution of w converges uniformly to the 

normal distribution with zero mean and variance — jt-. can be carried out 

(1 + «')* 

in the same way as the classical proof that the binomial law converges to the 
normal distribution. 






PROBLEM 07 TWO SAMPLES 


153 


It is obvious that 


w* 


k — 


m 

l + « 


Vm 


has the same distribution as w. From this and from the fact that U = 2k or 
2k — 1 Theorem I follows. 

In using conventional tables of the Gaussian function to make tests of sig¬ 
nificance on U when tn and n are large, the reader is urged not to forget that the 
critical region of V lies in only one tail of the curve. 


6. An example. We give here a simple example illustrating the use of the 
statistic U and Theorem I. 

Suppose 50 observations were made on X and 50 observations on Y. Suppose 
further that these observations are arranged in ascending order and that the I th 
element of this sequence is said to have the rank i. The observations on X 
occupy the following ranks: 1, 5, 6, 7, 12, 13, 14, 15, 16, 17, 19, 20, 21, 25, 26, 
27, 28, 31, 32, 38, 42, 43, 44, 45, 50, 51, 52, 53, 54, 56, 57, 58, 62, 63, 64, 65, 
68, 69, 75, 79, 80, 81, 86, 87, 89, 90, 91, 93, 94, 95. 

The observations on Y occupy the remaining ranks. 

In this case, U — 34. 

For m = n — 50, 

E(U) = 51, 
a\U) = 24.747. 

The probability of getting 34 runs or less when the distribution functions of X 
and Y are continuous and identical is therefore less than 5 • 10~\ 


7. Consistency. We shall say that a test is “consistent” if the probability 
of rejecting the null hypothesis when it is false (i.e., the complement of the 
probability of a type II error, cf. Neyman and Pearson, [3]) approaches one 
as the sample number approaches infinity. In the literature of statistics a 
function of the observations which converges stochastically to a population 
parameter as the sample number approaches infinity, is called a “consistent” 
statistic. If a test of a hypothesis about a population parameter is made by a 
proper use of a consistent (statistic) estimate of the parameter, the test will 
be consistent also according to our definition, which thus furnishes an extension 
of the idea of consistency to the case where the alternatives to the null hypothe¬ 
sis cannot be specified by a finite number of parameters. 

It is obvious that consistency ought to be a minimal requirement of any good 
test. It is the purpose of this section to prove that, subject to some slight and 
from the practical statistical point of view, unimportant, restrictions on the 
distribution functions, the test furnished by the statistic U is consistent. 

We shall say that the distribution functions f(x) and g(x) satisfy the condi¬ 
tion A, if, for any arbitrarily small positive S, there exist a finite number of 



154 


A. WALD AND J. WOLFOWITZ 


closed intervals, such that the probability of the sum I of these intervals 
is > 1 — 6 according to at least one of the distribution functions/(x) and gix), 
and such that /(x) and g(x) have positive continuous derivatives fix) and 
g'(x) in 1. 

In all that follows, although m and n are considered as variables, their ratio 
m/n is to be a constant, denoted by a. Let 0 > 0 denote the level of signifi¬ 
cance on which the test is to be made, so that, if /(x) se g(x), 

(23) P\U < «•(*»)} = fi 

where the critical region for two samples of size m and n, respectively, is given by 

U < Uo(»n). 

Theorem II. If /(x) and g(x) satisfy condition A, and if 

(24) /(*) J* gix), 

then 

(25) Lim P{U < uo(m)| = 1. 

The proof of this theorem will be given in several stages. 

;f‘,gj and g^j denote the mean and variance, respectively, 

of —, when X and Y have the distribution functions/(x) and g(x), respectively, 
m 

and the sample numbers are m and n. Let the set Xx • • • x m ; yi • • • y„ be 
arranged in ascending order of magnitude, thus: 

(26) Z = ZI , Zt , • • • , Zm+n , 

where zi < < • • • < z»+» . The sequence 

(27) V = fi , t's, • • • , v m+n 

is defined as follows: = 0 if z, is a member of the set Xi • • • x m and »,• = 1 
if ti is a member of the set yi • • • y H ■ 

Lemma 1. If the following are fulfilled : 


a) 

fix) m 0 

x < 0, 


fix) m x 

0 < x < 1, 


fix) - 1 

x > 1. 

b) 

gix) sb o 

x < 0, 


gix) - i 

X > 1. 


c) The derivative g'(x) of g(x) exists, is continuous and positive everywhere in 
the interval 0 < x < 1. 


Let E[- 
\m 



PROBLEM OF TWO SAMPLES 


155 


d) k is an arbitrary but fixed positive integer. For every m, ii m < *t« < 
• • • < ikm are a set of k positive integers subject only to the restriction that the 

least upper bound y of the sequence —is less than 1. 

m + n 


Then the expected value 





satisfies the inequality 


(28) 


E 


OM-fi 


g'(a\ lm ) 


'hsl- 


a+g'(axi ,) 




where X, m = —— • and ax,„ (j = 1 • • • k) is the root of 
m + n ‘ 

(29) rnax im + ng(a Xjm ) « X, m (rn + n) 
and <f>(m) depends only on m and is such that 

(30) Lim ip(m) = 0. 


It is easy to verify that the root a\ im of (29) exists and is unique. 

Proof: It will be sufficient to show that, for any specified set of values of 

i ®<<r+«)» ' * ‘ v *km ( r — 1 • ‘ • k) 

the conditional probability P | v< rm = 1} of the relation in braces satisfies the 
inequality 

(31) - P{vi rm - 1J < t(m), 

I ot + g {ax r J 

where f{m) depends only on m and is such that 

(32) Lim \p(m) — 0. 

m -^0 

For each m let 

(33) V m = • • • Vi (r _ 1)n , t'i(P+1)M ' * ■ Vikm 

be a fixed sequence whose elements are either 0 or 1. We shall consider the 
conditional probability P{vi rm “«},(#“ 0,1) of the relation in braces subject 
to the condition that 

(34) v ilm * v iim , (j = 1, 2, • • • (r - 1), (r + 1), (r + 2), • •. *). 

Let a and b be two numbers such that 0 < a < b < 1, and let m* be a non* 
negative integer such that tn* < m, and m* < [y(m 4- »)] where (y(m + n)] 
denotes the largest integer < y(m + »). Let Q m (a, b, m*) denote the proba- 




156 


A. WALD AND J. WOLFOWITZ 


bility that, if to* observations are made on X and [y(m + n)] — m* observations 
are made on Y, the following conditions will be fulfilled: 

(a) the total number of observations < a is exactly v» — 1 

(b) all observations are < b 

(c) if the [y(m + n)\ observations are arranged in ascending order and if 
v* = 0 or 1 according as the j*^ element is an observation on X or on Y, then 

(35) 0 = 1,2, 1), 

and 

(36) *C-i = (j-r + l,r + 2...*). 


It is easy to see that the probability P 0 of the simultaneous fulfillment of the 
relations (34) and of Vi rm — 0 is given by 

(37) P 0 - ffll R* (a, b, m*)w'(l - &)"'-*( 1 - g(b))”' da db, J ■ 

Jo Jo m* 

where 

(38) Rja, b, to*) = n C m . B C lr(n+ ^ (a, b, to*), 

(39) to' = TO — TO*, 
and 

(40) n' = n — [y (to + n)] + to*. 


Similarly, the probability Pi of the simultaneous fulfillment of the relations 
(34) and of v ifm = 1 is given by 

(41) ft = ffZ Rfnia, b, to*) »Y(o)( 1 - 5) m (l - gib))”'- 1 dadb . 

Jo Jo m* 

Then 


*>{*,. = 01 = P0 
Pi»i rm = 1 } Pi' 


Let no = £ Vi and too == to + n — [y(m + n)] — no. The variables 

/>[ y (m+»)] 

(z irm - 0x rm ), («iT(»+»)] - a y ), g(a)) } a11 conver * e stochastically to 


zero. 

Let Po(«) and Pi(«) denote the values of the right members of (37) and (41), 
respectively, if the integration is restricted to the region where a < b, 
| a — a\ rm | < *, | 6 — a T | < o and the summation is restricted to those values 



PROBLEM OF TWO SAMPLES 


157 


of to* for which — — ^ <«. Hence, because of the aforementioned 

n' (1 - g(ay)) 

stochastic convergence, for all sufficiently large m 


(43) 


I P,(t) - P. \ < t e = 1, 2. 


Since P, > 0, for sufficiently large to, also 


(44) 


Poit) Po . 

m Pi 


Since g(x) and g'{x) are continuous in the interval [0, 1] and hence uniformly 
continuous, it is clear that 


(45) 


iftM 

Pi(«) /(<0 


where c is a fixed constant independent of to. From (44) and (45) it follows 
easily ihat, for any arbitrarily small 


(46) 

for sufficiently large to. 


n 

Pi 


a 

ff'farm) 


< e' 


Since P{v< rm = lj = » the required relation (31) follows. This com- 

ro -+• r i 

pletes the proof of Lemma 1. 

Lemma 2. // conditions a, b, and c of Lemma 1 are satisfied, then 


(47) 

LimE (¥ ;f;g) = 2 [ 

»-*• \Wl / ■'0 

?'(*) 

a + ff'fa) 

and 

(48) 

m-*ao \m / 

= 0. 


Proof: Since 


(49) 


mm y-s 


m 


_ 1 + t>l + Pm+« , 2 " yT 1 -& y, 

---■+• — 2^ — — 2* Vf-iVj, 

m 771 y—j 


2 


m-f-n 


m 


we have from Lemma 1, 
(50) 


»© 

_ 2 fy> g'(ojm) 

v* / ?'(«*.) Vi 

TO L y a + o'(aim) 

T\« + ^(dro)/J 


2 v T ag'iojm) 1 

+ n(>») + i»*(y), 


to ^ L(a + 0'(ty»))*J 



158 


A. WALD AND J. WOLFOWITZ 


where 

(51) Lim tj(m) = Lim y*iy) = 0 

m~»oo 1 

and ajm is the root of the equation 

(52) ma im + ng(a im ) = j (j = 2 . • • m + n). 

From equation (52) it follows that 

(53) Lim ( a im - a (j _, )m )(m + ng'(a im )) = 1 

m—♦» 

uniformly in j. Since y may be chosen arbitrarily near to 1, the required 
result (47) follows easily from (50). 

It remains to consider the variance of —. The expression 

m 

I I I O W’f B — 1 

J-t>HHw» + 2 £ 

m m y-2 

2 1 . 

differs from - by at most - , so that its variance converges to zero with m —* oo. 
a m 

In order to prove (48), it will be sufficient to show that the variance of 

1 m-f n 

(54) w = 

m j-i 


goes to zero with increasing m. From Lemma 1 it follows that 


(55) -z(m) < [E{viVjV k v,) - E{v { v,)E{v k v.)] < z(m), 


where Lim | z(m) \ — 0, provided only that the integers i, j, k, l are distinct 

m-*oo 

and < y(m + n). The variance of rriW is the sum of terms of the type occurring 
in (55). The number of terms for which i, j, k, l are distinct is of the order m*. 
All other terms are of size at most 2 and their number is of the order m. Since 
the number y may be chosen arbitrarily near to 1, the variance of W converges 
to zero with m —* ». 

This proves Lemma 2. 

Lemma 3. If conditions a, b, and c of Lemma 1 are fulfilled, and if (24) holds, 
then 


<M) 


Jo a 


g'(x) 


0 a + g'(x) 


dx < 


1 

1 + «" 


Let Oi < at be any two real numbers and designate 


Oi + at 
2 


by a s . 


F(x) be defined as follows: 


Let 


( 57 ) 


F( ai ) - 0, 

F{x) - (x - ai)b { + Fiat), 


(a< ^ x £ a<+i; i - 1, 2). 



PROBLEM OF TWO SAMPLES 


160 


Let c be defined by 


(58) 

F(at) = c(o» - ai). 

Then it is easy to verify that the maximum of 

(59) 

T* = (”* F ’^ dx 

L a + F'(x) aX 

with respect to bi and fo , subject to the restrictions that bi and 6* be non¬ 
negative, and that «i, a 3 and c be fixed (c > 0), occurs when and only when 

(60) 

O- 

II 

II 

Ci 

Now define 


(61) 

p U = P»i = o, 

1 _ O(Pii) — g(P «-!)/) 
hl 2> 

and 

(t-1,2, ...^jj-0,1,2...). 

Repeated application of the result of the previous paragraph easily gives 

(62) 

s,>s i+l . 

From (24) it follows that there exists a positive integer j f such that Sy > Sj’+i . 
Obviously 

(63) 

nr- 

l + a 

and 


(64) 

Lim Sj = T. 

i- 

Hence Lemma 3 is proved. 



Proof of Theorem II: Let 61 > 6 * > • • • > S, > • • • be an arbitrary but fixed 
sequence such that lim 5, = 0. For 5 = 5,-, let h , • • • , /*<,•> be a set of closed 
intervals such that no two intervals have an interior point in common and 
within which, by condition (A), f'(x) and g'{x) exist, are positive, and con¬ 
tinuous. Let hi be the complementary set (with respect to the whole line). 
(It is easy to see that, if condition (A) is fulfilled, such a system can be con¬ 
structed.) Let Ui(i = 1, 2 • • • k(j) and Uoj denote, respectively, the runs 
caused by the observations which fall in the intervals I <, J 0 /. Then 

V - T, Vi -U 0i \< 2(fc(j)). 


( 65 ) 



160 


A. WALD AND J. WOLFOWITZ 


From condition (A) it follows that, with a probability arbitrarily close to 1, for 
sufficiently large m, 

(66) Uoj < 3 pmSj, 

where p = max £l, (j = 1,2 • • •)• 

Let [a< < * < bi], i = 1, 2 • • • denote the interval It , and let and n< denote 
the number of observations on X and Y, respectively, which fall in the interval 

li . Then — and — converge stochastically with increasing m to [/(6<) — /(a,*)] 
m n 

and \g(b{) — jf(o«)l, respectively. 

Within the interval h{i = 1, 2 •.. k) we perform the transformation 

(67) X* - /(X), Y* = /(F), 

which leaves Ui invariant. For fixed , n t the relative distribution of X* 
is uniform and the relative distribution of Y* fulfills condition (c) of Lemma 1. 

Hence from Lemma 2 we obtain that — converges stochastically to 


Lim E 




2(/(&.) - f(ai)][g(bi) - fif(ai)] 

- g(at)) + a[f(bt) -f wi¬ 


lt can be verified that the sum of the second members in (68) over all values i 

2 

is less than or equal to 5 — ; — . 

1 •+■ a 

From (24) and condition (A) we get that, for sufficiently small S (, there exists 
at least one interval for which the first member of (68) is less than the second 
member. Hence 


(69) 

where 


2 < 


2 

1 + a’ 


(70) 2 = ZLim E(^;f;g). 

m-*x \ Tfl / 

Now take j so large that 


(71) 


3 p8j < t, 


where 


(72) 


0 < 3« < —?-2. 

1 + a 



PROBLEM OF TWO SAMPLES 


16 ! 


Since — converges stochastically to its expected value, from (66), (66), (70), 
m 

(71), and (72), it follows that, with a probability arbitrarily close to 1, for suffi¬ 
ciently large to, 


(73) - < 

m 

From (23) and Theorem I we get 


2 

1 + « 


«. 


(74) 


Lim 

m-+oo 


Uo (to) 


2 


m 1 + «' 
Theorem II follows easily from (73) and (74). 


8. Remarks on a proposed test. We have already remarked in Section 3 that 
the test proposed by W. R. Thompson is not consistent. To show this, we shall 
give two distribution functions /(x) and g(x) such that, although these functions 
will be very different, the probability of rejecting the hypothesis that they are 
the same will not approach one as the sample number approaches infinity. 

Suppose, to simplify the notation, that the observations have been ordered 
according to size, i.e., that x t < x* < • • • < x m and in < Vt < • • • < y n . Sup¬ 
pose further than m = «, and that the test is to be made on a level of significance 
0 > 0. In the right member of (2) we need not exhibit n and shall replace 
k and k' by k(m) and k'(m) to show the dependence on m. We have, under the 
null hypothesis, 

(76) F{x* (m) < = i pirn, k(m), k'(m )) * 0. 


1c(m) 

The sequence --—- is bounded, so that there exists a monotonically increasing 
m 

subsequence TOi, to* • • • of the sequence of integers 1, 2 • • • and a number h, 
0 < h < 1, such that 

(76) Lim ^ = h. 

<-+oo mi 

It is easy to see that then also 


(77) 


<-«. Vti 


h. 


We shall now assume that 0 < h < 1. If h = 0 or 1 only a trivial alteration 
will be needed in the argument to follow. Let« and 8 be arbitrarily small posi¬ 
tive numbers. We now consider two populations, A and B described as follows: 

A) f(x) m g{x) m x (0 < X < 1), 

B) fix) m X (0£x£l), 


g(x) 


gfa) + — - Z 

(a+M — <k) 


(<*i <L X £ 0*1;» - 0,1, •.., 4), 



162 


A. WALD AND J. W0LF0W1TZ 


where 


o 

li 

<5 

o 

II 

&i = /i — 25 > 0 

o 

II 

dj = /i — 5 

ff(aj) = Oi 

as ~ h ~ 5 < 1 — 5 

€ 

li 

y—s. 

J? 

a 4 « 1 - t 

£ 

II 

£ 

a* = 1 

g(«0 = 1 


The definition of f(x) and g(x) outside the interval 0 < x < 1 is obvious. It 
will be shown that even for such different populations as A and B and for 
samples of size greater than that of any arbitrarily assigned number, the prob¬ 
ability of rejecting the null hypothesis if B is true will be at most /9 + «. 

Let hi, hi, h a denote the number of observations on X which fall in the 
intervals 0 < x < at ,(ii < x < a 3 ,a 3 < x < 1, respectively (m fixed, of course). 
Let hi, hi, ha be the corresponding numbers for Y. For a fixed m, the prob¬ 
ability of a set hi, hi, h a , h[, hi, ha is the same whether the sample be drawn 
from the population A or B. From (76), (77), and multinomial law it follows 
that for all sufficiently large m, the probability is at least 1 — «of the occurrence 
of a set hi, ha, h 3 , h[, h'i, h[ for which z* (mi) and yvim) will both fall in the in¬ 
terval Oi < x < da. Furthermore it is obvious that for all samples with fixed 
hi, h'i the distribution within the interval a 2 < x < aj is the same whether the 
sample came from the population A or B. Hence even when the sample is 
drawn from the population B, the first member of (75) is < 0 + t. This com¬ 
pletes the proof of the inconsistency of the test based on (75). 

This test is consistent if the alternatives to the null hypothesis are limited, 
for example, to those where g(x) s f(x + c), c a constant. 

REFERKNCES 

[11 William R. Thompson, Annals of Math. Slat., Vol. 9, (1938), p. 281. 

[2| Maurice FkAchet, Ghniralitis svr lea Probability. Variables aU.atoires, Paris, (1937). 
|3] J. Neyman and E. S. Pearson. Statistical Research Memoirs. University College, 
London. Vol. 1, (1936). 


Columbia University, 
New York, N. Y. 



THE SUBSTITUTIVE MEAN AND CERTAIN SUBCLASSES OF THIS 

GENERAL MEAN 

By Edward L. Dodd 

1. Introduction. No general agreement has been reached, so far as I know, 
as to what constitutes a mean. A necessary condition which appears to meet 
with general approval is that a single-valued mean of a set of numbers all equal 
to a constant c should itself be equal to c. However, there appears to be some 
valid objection against imposing any other proposed condition as necessary . 

Of course, intermediacy is a condition that suggests itself at once. Indeed, 
in certain mean value theorems in general analysis—such as the First Theorem 
of the Mean for integral calculus, which I mention in Section 3—intermediacy 
is the main feature. 

However, O. Chisini [1] insisted that intermediacy or internality is not the 
chief characteristic of a statistical mean. Rather, a mean is a number to take 
the place, by substitution, of each of a set of numbers in general different. 
Such a mean may well be called a representative or substitutive mean. 

Chisini defined m to be a mean of X \, z* , • ■ • , x n , relative to a function F, 
provided that 

(1.1) F(m, m, • • • , m) = F{x x , x 2 , • • • , x n ). 

If, for example, 

(1.2) F(x 1 , X 2 , • - • , Xn) » 2Z? = 2W 2 = UVl , 
the mean m thus obtained is the root-mean-square 

(1.3) m = =b [(l/n)Sx?] 1/2 . 

The choice of F, Chisini noted, depended upon the use to be made of the 
mean. 

Suppose now that f(x x , x *, * • • , x n ) is such a function that one value of 

(1.4) f(x, x, •. • , x) = x. 

And suppose that this/is taken as a particular F for (1.1) to determine a mean 
m implicitly ; thus 

(1.5) f(m % m, • • • , m) = f[x i, , • • • , x n ). 

Then, from (1.5) and (1.4) it follows that one value of 

(1.6) fix i, y * * * i £») “ m. 

And thus / determines the mean m both explicitly and implicitly . 

168 



164 


EDWARD L. DODD 


It should be noted that the F = 2x< in (1.2) is not itself a mean of the x ,. 

If, in (1J2), we take Xi — —2, x 2 * 1, x» = 1, then the double-valued mean 
m = ± 2 m results. Now — 2 m is internal; e.i. —2 < — 2 1/l < 1; but 2 m is 
external, for 2 1,i > 1 > — 2. But since here 2x< — 0, it follows also that the 
standard, deviation of —2, 1, 1, is the external mean 2 m . Chisini [1], indeed, 
used the root mean square to show the possibility of external means. External 
means have been noted by other writers, [2-7]. 

It is noteworthy that a number of writers [8-12] have used the condition 
(1.4) (in general, with / single-valued) as one of a set of axioms to 
characterize particular means. Sometimes, this has appeared in weaker form 
as/(l, 1, ••• , 1) - 1. 

This paper will be concerned primarily with the mean of a finite number n, 
of variates, Xi, x 2 , • • •, x„ . Possible generalizations will be mentioned briefly 
in Section 8. 

In the conception of the substitutive mean, m, as I have been using it for some 
time, emphasis is laid upon the explicit form for m; and provision is made for 
multiple values. 

Definition of the Substitutive Mean. Let f(x \, x 2 , • • • , x„) be a func¬ 
tion of n variables, Xi , x 2 , • • • ,x„ defined at least for one set of equal values, x< = k. 
If c is any number such that f(c, c, • • • , c) is defined, let one value of 

(1.7) f(c, c, ■ ■ • , c) — c. 

Then f(x\ , x 2 , • • • , x„) will be said to be a substitutive mean of x x , x 2 , • • •, x„ . 

If an original formulation of a problem does not assign to a function a value 
when the variables are all equal, it is sometimes possible to assign such values 
by continuity considerations, such as are commonly used in the “evaluation’' 
of indeterminate forms. This will be discussed in Section 6. 

In the following, when the word mean is used, it will designate the substitu¬ 
tive mean as defined above. 

2. Classification of Means already made. Some general classes of means 
have already been distinguished. One important basis for a classification of 
means is the kind of data to be used. The data may be only qualitatively 
distinguishable. Then numbers may be assigned to qualities. For dealing in 
a very general way with all kinds of data, C. Gini and L. Galvani [13], and 
G. Pietra [14], distinguished between data in rectilineal series, in cyclical series, 
and in unconnected series. These three classes are associated respectively with 
the straight line, the circle, and a regular polyhedron (in three dimensions, the 
regular tetrahedron, and in n dimensions, a polyhedron with n + 1 vertices each 
at the same distance from each of the other n vertices). 

For one definition of the arithmetic mean of a cyclical series, Gini usee the 
center of gravity principle; and this mean is computed with the aid of sines and 
cosines. By mechanical means, such an arithmetic mean of dates—for example, 



THE SUBSTITUTIVE MEAN 


166 


of dates of weddings—as days of a year can be found. On the rim of a wheel 
delicately suspended and marked off for the 365 days or 366 days of a year, let 
small weights proportional to the number of weddings on a day be placed in the 
spaces assigned to the individual days. Then when the wheel comes to rest, 
the arithmetic mean of the dates will be found at the lowest point of the rim. 
In the special case where the center of gravity of the system is at the center of 
the circle, the mean is indeterminate, or we may say that every day is a mean 
day. 

Also, for cyclical series the arithmetic mean and the median are defined by 
other methods, using such principles as minimizing the sum of the squares of 
deviations or the sum of the absolute deviations. 

The properties of means may be made the basis of a classification, either those 
properties which have been evolved by writers [8-12], [15-18] who have char¬ 
acterized specific means by sets of axioms, or those properties which seem of 
special importance in making distinctions. Two such properties will now be 
mentioned. 

Gini [19] recognizes two large classes of means: “A) medie ferme, B) medie 
lasche,” the latter (loose) class including the median and mode for which values 
do not depend upon all the data. To describe this latter mean m of arguments 
Xi , we might write dm/dxi = 0 as applying to several if not most of the argu¬ 
ments over wide ranges instead of at isolated points. 

Subclasses of A or firm means as given by Gini will be discussed in Section 4. 

Another rather large classification distinguishes between simple means and 
their weighted forms. In a case often encountered, where the weights are 
whole numbers indicating frequencies of occurrence this distinction is of little 
significance. In the more general case, however, where weights may give ratings 
of the efficiency of measuring instruments or the weights may be negative [6, 
20], more direct attention needs to be paid the weighted forms. 

To supplement classifications already proposed, 1 am indicating in the next 
section a descent from the substitutive mean, the most general of all means, 
down through two classes of means less general, which I am calling the summa¬ 
tional mean and the quasi-arithmetic mean, to the more specific mean known 
as the associative mean, studied in particular by M. Nagumo, [21] A. Kolmogoroff, 
[22] and B. de Finetti, [2]. 

The foregoing subclasses of the general or substitutive mean are based 
primarily on structure, the way the mean is formed. 

3. The Summational Mean, Quasi-Arithmetic Mean, and Associative Mean. 

The summational mean, now to be defined, is a generalization of the weighted 
arithmetic mean. 


W s= Cl!Cl CfXt + » »• + CnX n 

Cl + Ct + • • • + Cn 


(3.1) 


Set 9* 0 . 



166 


EDWaAD L. DODD' 


It is to be noted that although W is not a symmetric function of x <, IF is a 
symmetric function of c,-x,-. In the generalization Q, the following features of 
IF are retained: 

1. Certain weights c< being given, Q is a symmetric function of c,x,. 

2. This Q may be determined from sums of n terms, each term involving 
one and only one x <. 

Definition. Let 2 denote a summation for i = 1, 2, • • •, n. Suppose that 

(3.2) F\y, 2 /i(c,-x f , y), 2 / 2 (c,x<, y), • • • , 2 /*(c<Xi ,y)\ =0 

has a solution, y = Q which is a substitutive mean of xi, x t , ■ • ■, x„ . Then Q 
will be called a summational mean of x i, x 2 , • ■ •, x„, relative to the functions f\ , 
ft, ■■■ fk, and F. 

Sometimes it is possible to express Q as 

(3.3) Q = G{2gi(CiXi), 20 2 (c<x<), • • • , 20 *(c<x,-)]. 

Among summational means, those of most frequent use involve in a special 
way but one summation. Thus with ^(x) a function, which would usually be 
taken as continuous, this m satisfies 

(3.4) Hm)2ci = 2 ceKx<). 

But this, with c< > 0 , is just an algebraic analogue or prologue to the First 
Theorem of the Mean for integral calculus—the c, to be replaced by a positive 
integrable function. Without further specification, this mean m may have an 
uncountably infinite number of values. But if it be required that ^(x) be a 
continuous increasing function, and that c, > 0 , then m is unique. 

In a series of papers, C. E. Bonferroni [ 20 ], [23-27] used means such as m in 

(3.4) for statistical and actuarial problems. And, as he had in mind [28] dis¬ 
tinctly the notion of substitution, he was in a sense a forerunner of Chisini. 
E. L. Dodd [29] made use of a mean m defined with the aid of n continuous in¬ 
creasing functions ^<(x), thus: 

(3.5) Sc.V'.^m) = Sc,V'<(x,), a > 0. 

If g<(x) = Ci\pi(x), this can be written 

(3.6) 20 ,(m) = 20 ,(x,). 

In one paper, C. E. Bonferroni [20], as already noted, used weights which 
might be either positive or negative. 

Some such mean as m in (3.4) has been used by a number of writers. Here 
^(m) is a weighted arithmetic mean of \j/(Xi); and thus it is natural to call to a 
quasi-arithmetic mean of x<. 

Definition. Let 2c,- # 0. If m is a solution of 

(3.4) }p(m)hci = 2 ce//(x { ), 



THE SUBSTITUTIVE MEAN 


107 


then m will be called a quasi-arithmetic mean of x,, with weights c<, and relative to 
tiie function f(x). 

Sufficient conditions for the existence of this mean m are: (1) That ty(x) be 
continuous in the interval I, finite or infinite, in which the observations x< lie; 
(2) That either c< > 0 for each i, or that \p(x) take on all real values, as x runs 
through I. 

It will be helpful to picture geometrically the double transformation or mirror¬ 
ing represented by (3.4). Points x< on the horizontal axis are carried vertically 
to the curve y = t(x) and then reflected horizontally to the y axis. For the 
points yi , on the y axis thus obtained the arithmetic mean y or “center of 
gravity” is obtained. Then y is carried horizontally to the curve and reflected 
vertically to the x-axis. The abscissas m of points on the x-axis thus obtained 
are means of the given x,-, relative to this ^(x). 

It may happen (Dodd [3 p. 746]) that the curve y — \p(x) contains horizontal 
segments, as in the curve for temperature y of ice-water-steam which has ab¬ 
sorbed a quantity x of heat. In this case the mean m may be an “interval,” 
an uncountable set of real numbers. Indeterminateness over an interval is a 
well known feature of the median of an even number of variates. In fact, a paper 
of D. Jackson [30] was for the purpose of indicating one method of selecting a 
single value from this interval of indeterminateness, as a median. 

It may be noted that a mean of n variables becomes, when n — 1, a function 
of a single variable; and thus it appears possible to implant in a mean of n 
variables almost any peculiarity found in a function of one variable. 

A special case of the quasi-arithmetic mean is the associative mean m which 
under some general conditions has been shown [2, 21, 22] to satisfy 

(3.7) nt(m) = 2f(x,), i - 1, 2, • • • , n; 

where ^(x) is a continuous increasing function. 

If fn(xi , x*, • • • , x„) is an associative mean, then by definition, /„(x i , 
x», • • • , x„) is unaltered when any k of the n variates are each replaced by the 
mean /* of that set. 

4. The Gini means as summational. Having distinguished firm means from 
loose means, Gini [19] noted that in the former class, a variate might appear as 
a base, as an exponent, or both as base and exponent. In general, these variates 
are to be positive. Gini then listed ten means of a decidedly broad character, 
some of them generalizing the combinatorial means treated by A. Durand [31] 
and 0. Dunkel [32]. See also G. Pietra [37]. 

These ten means involve only the four simple arithmetic operations and root 
extraction. For many purposes they are best expressed in the form given by 
the author. However, to show that these means are summational, logarithms 
will be used to reduce products to sums. 



168 


EDWARD h. DODD 


Let 

S p = Sx? i = 1, 2, • • • , n; 

»Co = n!/c!(n — c)!, a binomial coefficient; 

P c be any one of the „C c products of c different elements taken from 

(4.1) *x, xt ,•••,*»; 

Pe — (Pc) p , the p th power of P„ ; 

Zo — 2P 0 , the sum of all the „C c products P c ; 

Z; = SP?. 

In the expressions which follow, it is assumed that the denominators are not 
zero. 

The ten means, as defined in Gini’s Equations I, II, • • • , X, will bedesignated 
here by mi, ro*, • • • , row ; and their logarithms, with base arbitrary, will now 
be given. 

log wii = (log S r — log n)/p 
log m* = (log Z c - log n Co)/c 
log mi = (log Zo - log B C c )/cp 
log mi = (log S p - log <S 4 )/(p - q) 
log wi» = S x p log Xi/S v 

(4.2) 

log mi - (log Zo - log Zd - log nCo + log n Cd)/(c - d) 
log m, = (log Zl - log Zd - log n Co + log „C rf )/(c - d)p 
log mi = (log Zf - log Zl)/c(p - q) 
log = SP* log Po/cZo 

log fflio = (log Zo - log Zd - log „Cc + log nCd)/(cp - dq). 

As noted by the author, the foregoing include some well known special means. 
Thus, mi is the power mean, which for p = 1, 2, — 1, becomes respectively the 
arithmetic mean, the root mean square, and the harmonic mean. If p —♦ 0, 
then the limit of m% and of m 7 is the geometric mean. If p = 0, 1, 2, and q = 
p — 1, then m t is respectively the harmonic, the arithmetic, and the contra- 
harmonic mean. 

For each of the ten means, Gini gives an appropriate name. Those involving 
binomial coefficients are combinatorial, a mean like the contra-harmonic with 
denominator other than a constant is biplanar, the more simple means 
monoplanar. 

When in the following, I show that certain combinatorial expressions may be 



THE SUBSTITUTIVE MBAN 


m 


replaced by sums, it is not implied that this replacement would simplify 
computation. 

To prove that mi, mt, ■ • •, mm are all summational means, it may be noted 
that n, p, q, c, d, „C e , and jCi are constants. Moreover, S p is the symmetric 
sum of the pth powers of x<, thus with only one x< in each term, and 
t = 1,2, • • •, n. And, since Z c , Zf , Za , and Zl are symmetric polynomials in 
the Xi , they may be expressed as polynomials in S l , £*,•••, by a well known 
theorem of algebra. Hence among the ten means, the only one that requires 
special attention is the ninth mean, m#. 

To show that is a summational mean, we need only examine the numerator 
of the right member. Let this numerator be N. 

(4.3) N — SP? log P c . 

Then 

(4.4) qN = (x?x? • • • x*) (log x\ + • • • + log xj) + • • • . 

Thus, if we set = x?, we may write 

(4.5) qN = (yiy» ■ ■ ■ y t )(log yi + • • • + log y e ) + • • • . 

The coefficient of log yi in this right member is the sum of all products of c 
different factors which include yi . 

Now, let Y r be the sum of the products of r different factors taken from 
Vi > V* y • • • » Vn ; and let T r be the sum of the products of r different factors 
taken from y t , y », • • • , y n . Then it is evident that 

(4.6) Yr - Tr + ViTr-i ,* T r = Y r — VlTr-l . 

If, now, we set F« = 1, it follows that 

(4.7) T e _i = Y e -i - yiY e . t + y\Y^ -+ (-1 . 

Hence, in qN, the coefficient of log y% is 

(4.8) yiT^i - yiY.-i - y\Y c - t +■•• + (- lr^F,. 

Thus in qN, the terms containing log y\ are 

(4.9) Tc-iyi log 2/1 - Y c -iy\ log yi + • • • + (~l)Vi log yi. 

Now let 

(4.10) U r = 2j/I log yi, i = 1, 2, • • •,». 

Then, 

(4.11) qN = Yc-iUi - Yc-iUt + ••• + (-lJ^Fof/.. 

Thus, qN is here constructed from sums of n terms with but a single yt in any 
term. 

Likewise, with y ( replaced by x<, a term contains but a single x< . 



170 


EDWARD L. DODD 


5. Transformations. A function /(xi, x s , ■ • • , x«) is not in general a mean 
of its arguments x». However, it is often possible to make a substitution 
it = 4>(yi) so that 

(5-1) f[<Kyi ),• • •, = g(vi ,yt, ,y«), 

is a mean of its arguments y %. 

The required substitution is sometimes obvious, as in the ease of the estimate 
8 of scale 

(5.2) 8 - l(l/»)2(x* - m)T - [(l/n)2yt] w . 

Here 8 is a mean of y% , although it is not a mean of xt . 

Definition. Let y = 0(x), in general multiple valued, be defined in an in¬ 
terval I, finite or infinite, the values of y lying in an interval J. Suppose that for 
each y in J, there is at least one x in I such that 0(x) = y. Let any such x be 
designated by <t>(y). Then <j>(y) will be called the inverse of 0(x). It follows that 
one value of 

(5.3) my)) = y- 
Theorem. Let 

(5.4) z = /(x!, x t , ■ • • , x„), 

in general multiple valued, be defined when each x< is in some interval I, finite or 
infinite. With x in I, set 

(5.5) +(x) = f{x, x, ••• , x); 

and suppose that y = ^(x) has an inverse, x = </>(//) defined in J. Let x< — 
<t>(yd be substituted into f to form the function 

(5.6) w = f[<Kyi), <t>(yi), ■■■ , 4>(y n ) 1 = g(y i, Vt, ■ ■ ■, y n ). 

Then w is a mean of y,, defined when y, is in J. It is thus a mean of 
where x, is in I. 

If further, \p(x) is a continuous increasing function of x, then for a given set of 
Xi, the values of z and w are identical. T he same is true fora given set of n values yi . 
Proof. If each y% — c, a number in J, then 

(5.7) f[4>(.V\), , 4>(y»)\ = f[<t>(c), ■■■ , 0(c)] = 0[0(c)]. 

And one value of 0[0(c)] is c, from the definition of the inverse function <t>{y). 
Moreover, if a number c' is taken in I, then f(c') is some number in J, which 
we may call c; and the argument above is applicable. Finally, if 0(x) is con¬ 
tinuous and increasing, then a number x,- in / is associated with one and only 
one Vi in J) and vice versa. Thus w and z become identical. 

In the foregoing, we started with f which is not a mean of its arguments x<, 
and obtained g which is a mean of y,. Something like the reverse of this is 
possible. The last member of (5.2) is a mean of y< . It was obtained by treat- 



TffE gtJMTTTtmviI MEAN 


171 


ing tn as a constant, with respect to x<. If, however, m is an estimate for 
location and is taken as (l/n)2x<, and this is substituted into (5.2) then 

(5.8) « = {[(» — l)/n]2z* — (2/n)2anX/| 1/, ) * < j. 

This 8 is now not a mean of x<; for if * equal any constant c, then 8 = 0. 
Furthermore, there exists no single valued continuous increasing function x = 
such that if x< ** is substituted into (5.8), 8 will be a mean of the 
yi . Thus the elimination of m from (5.2) interferes with the status of 8 as a 
mean of the x ( . 


6. Indeterminate Forms that arise in testing for Means. Sometimes a func¬ 
tion / is substantially continuous. But the investigation leading to the func¬ 
tion fails to assign to the function a value for certain values of the argument x, 
or arguments, x\ , ,•••,*« . However, values are often assignable which 

will make the function continuous. This is the usual occurrence when, in curve 
fitting, parameters are estimated. In general, the measurements are assumed 
to be not all alike. However, when a general function such as 2x</n for loca¬ 
tion is obtained, we do not hesitate to assign to this function the value c when 
each Xi — c, to make the function continuous. 

As another illustration of “indeterminate forms,” consider the Jackson [30] 
median, M, of four numbers x\ g x t < x» ^ Xt , viz., 

(6.1) M - (x*x» - a*tx)/(z4 + x t - x t - Xi). 


A direct substitution of x — c, renders M indeterminate. But if *,• —> c, 
indeed, if merely Xt —> c, and xt —* c, so also does M. 

In a recent paper, R. Cisbani [33] generalizes means suggested by Dunkel 
[32] and L. Galvani [34] by setting up 

t « -i—i/* 

n 1 £ (a 1 + ih) * /, J , j ^ 0, x * 0; 


and letting n —» ». There results an integral with the value 


(6.3) 


... T b'+'-a'** T" 

Viix) l(x/j+ l)(fe> - o’)J 


for the case, x & j. This mean set up as a mean of an infinite number of variates 
turns out to be also a mean of the two numbers a and b, —which for b — a be¬ 
comes indeterminate. But as b approaches a, so also does $,-(z) approach a. 
This is also true for the special cases x = —j, etc. 

In testing to see if a function m of z< is a mean of these numbers, a difficulty 
sometimes arises, because a substitution of Xi = c and m = c into the equation 
which implicitly defines m will put zeros into denominators. An aid in such 
testing will now be formulated as a theorem, although the ideas involved are 
not essentially new. 



172 


8DWARD t. DODO 


Theorem. Let fix) be a continuous increasing function of x defined for each 
real x. Let 

(6.4) /(0) *= 0. 

Given n real distinct numbers 

(6.6) *»<**<•••< Xn-l < X n , 

n positive numbers, ki, and a real number C. 

Set 


( 6 . 6 ) 

Then F(x) 
(6.7) 


F(x) = 


fci 


+ 


fcn 


- C. 


f{xi - x) fix* - X) 

0 has n — 1 real roots m,-, such that 
Xi < wii < a* < m* < • • • < m»-i < x n ; 


also, a root less than xi , provided 

(6.8) Sfcj//(+°°) < C; 
or a root greater than x n , provided 

(6.9) Sfc,//(-oo) > C. 

Proof. Since fix) is a continuous increasing function of x, so also is 
ki/f(Xi — x), except for the single value, x = x,. So also, then, is F{x), except 
when x = Xi or or • • • or x n . But 


(6.10) F(X{ + 0) = - ao; F(Xi+i - 0) = + ». 

Hence, between x t and *<+ 1 , there exists a root , of F{x) — 0. 

Moreover, since 

(6.11) F(- oo) = [2fc < //(+«)] - C‘,F{Xv - 0) « +«; 

it follows that there is a root less than Xi , provided (6.8) is satisfied. Likewise, 
there is a root greater than x„ if (6.9) is satisfied. 

The use of this theorem in testing for means is simple. Keeping the Xi dis¬ 
tinct, the equation F(x) = 0 determines (n — 1) numbers, vij, such that if 
Xi —► c, so also do these m, —»c. Employing continuity to define to,- when each 
Xi = c, we may say that each m,' is a mean of x { ; j = 1, 2, • • • (n — 1); * = 
1, 2, ••• n, when the conditions of this theorem are satisfied. If F{x) = 0 has 
still another root, m, this m will not in general be a mean of Xi . 


7. Summational Means arising in the Estimation of Parameters of Frequency 
Distributions. In curve fitting, the estimation of parameters leads in general 
to summational means. If the method of moments is used, the first step is to 
find the moments by summation. I have already considered estimates for 
location and scale by this method [7], and by the R. A. Fisher method of maxi- 



the MJsarmmva ueam 


173 


mum likelihood [4]. A further study of the results of the likelihood method will 
now be made. 

By this method, products which first appear are reduced to sums by log¬ 
arithms, and the means found are, in general, summational. Some idea of the 
forms of these means can be obtained by examining a rather general form of 
frequency function which includes the Pearson Type I, and involves parameters 
with estimates p > 0 and q > 0, in addition to the location to and scale a. 
Let the observations be *i, x%, • • • , x n ; let 


(7.1) 

(7.2) 


(*.• 


V = 


to)/ a; 0 £ f £ 1; a > 0; 

1 r (p ± q) 
a r(p)r(q) 


r l a -1r 


The likelihood L is obtained by multiplying together the n factors obtained 
by substituting t = h , h ,-••,<» • 

Then 


(7.3) 


log L = — n log a + n log r(p + g) — n log r(p) — n log T(q) 

+ (p - 1) £ log U + (.q - 1) Z) log (1 - U). 


From dL/dm = 0, there is obtained 

(7.4) PS —L- + Q2 --i— =0; P - p - 1, Q - q - 1. 

Xi — 771 X{ — 771 — a 


Suppose P 9* 0 and Q 9 * 0; and as a first case, suppose P + Q 9 * 0. If each 
x t is replaced by x, the above equation leads to m = x — (Pa)/(P + Q). 

Then to is a summational mean of 

(7.5) xl = Xi - (Pa )/(P + Q) t = 1, 2, • • • , »; 

as seen by applying the Theorem in Section 5. 

Likewise, a is a summational mean of 

(7.6) x" = (x t - w)(P + Q)/P. 

If P s* 0, Q 9 * 0; but P + Q = 0, then (7.4) becomes 

(7.7) 2--- = 2 —— . 

Xi — to — a Xt — TO 
Now set xt — m,C - 21/p,'; and write (7.7) as 

(7.8) F(a) = 2 —-C - 0. 

Vi — a 

This has the form given in (6.6) with x replaced by a, k { = 1, /(a) = o. If then 
Pi < Vi < • • • < V* > there exist (n — 1) solutions a* of F(a) *= 0 between Vi 



174 


EDWARD L. DODD 


and y„ . And thus keeping the j/< distinct, if —» c, so also do the a/ —» c. 

These a,- are then means of jq, and thus, means of — m. 

In the more general case where P + Q 0, it is seen also that Q is a summa¬ 
tional mean of 

(7.9) f-r-^-4 

\jXi — m J 

From dL/da = 0, quite analogous results are obtained. The special case 
now, however, is given by P + Q + 1 = 0 = p + g— 1. And, with the 
continuity interpretation, a is a mean of Xi — m; and moreover, m is a mean of 
Xi — a. 

Using now the digamma function 

(7.10) f(u) = log r(u), 
set 

(7.11) D(p) = f(p + q) - F(p). 

The condition dL/dp = 0, then leads to 

(7.12) D(p) = (l/n)S(—log U), 0<i,gl, 

Now, with q > 0, D(<») = 0, D(— 1 + 0) = <x >; and D(p) is a continuous de¬ 
creasing function of p, when p > — 1. Then, since — log <, > 0, there is a 
unique p > — 1 to satisfy (6.12). 

To be useful, here, p should be > 0. But, at all events, the p thus found is 
a mean of Z) _1 (—log U), where D~ l is inverse to D. 

The digamma function (7.10) appears also in estimating the parameters for 
the Pearson Type III. 

(7.13) y = 1 l + ^ e~‘t p , < = (x - m)/a, p > -1. 

By setting dL/dp = 0, it is found that m is the arithmetic mean of Xi — ae fip+1) ; 
a is the arithmetic mean of (Xi — (p+1) ; while p is a summational mean of 

f ^log (xi — m)/a\ — 1, where f~ l is the inverse of f . From dL/dm = 0, it 
is found that m is a summational mean of — pa; a is the harmonic mean of 
(xi — m)/p; and p is the harmonic mean of (Xi — m)/a. Finally, from dL/da = 
0, there is obtained 

(7.14) (1 /n)l Xi = m + a(p + 1), 

which makes m, a and p each an arithmetic mean of a simple function of the 
observations , when the other two estimates are taken as constants. 

Comparison of (5.2) with (5.8) has shown that after complete elimination, 
estimates may cease to be means. However, it may be noted that 8 is more 
frequently exhibited in the form (5.2) where it is a mean than in the form (5.8) 
where it is not. 



THE SUBSTITUTIVE MEAN 


175 


8. Generalizations. The extension of results from the discrete or discontinu¬ 
ous case where a mean m depends upon only a finite number of elements to the 
continuous case is fairly immediate, with integration taking the place of summa¬ 
tion, and a distribution or frequency function taking the place of discrete weights, 
Ci . Stieltjes and Lebesque integrals may be used as well as Riemannian. Such 
a generalization of the Chisini mean was given by de Finetti [2J. 

The summational mean, which I have defined as involving possibly several 
summations, may be generalized likewise. 

In terms of set functions, sometimes called functionelles, I gave [35] the fol¬ 
lowing general definition of a mean with a point set H in mind as a distribution 
function. 

Definition. Let E and H be sets of numbers. Such a number t may be a real 
number or a vector number t = (£i , , • • • , 4). 

Let E t be the result of replacing each number of E by a single number t. 

Then the mean m of numbers in E, relative to the set H, and to a function /, is 
given by m — f(E y H); provided that the function f has been so constructed that 
for each t in j E y f(E t , H) = t y or at least one value of this f is t. It is to be under¬ 
stood above that when E is changed to Et , the set H remains unaltered. 

This retains the chief feature of f(t y t y • • • , t) = t in explicit form or of /(/, 
t y • • • , t) * f(t\ , U , • • • , t n ) in implicit form, where t is a mean of t\ , t% , • • • , t n . 

I used [36] a somewhat less general definition to discuss regression coefficients. 
All such means may well be called substitutive or representative . 

REFERENCES 

[1] O. Chisini, “Sul concetto di media.” Periodico di Mat ., Ser. 4, Vol. 9 (1929) pp. 

106-116. 

[2] Bruno de Finetti, “Sul concetto di media.” Oiornale dell ’ Inst. Italiano degli 

Attuari, Vol. 2 (1931) pp. 369-396. See p. 378. 

[3] E. L. Dodd, “The complete independence of certain properties of means.” Annals 

of Math., Vol. 35 (1934) pp. 740-747. 

[4] E. L. Dodd, “Internal and external means arising from the scaling of frequency 

functions.” Annals of Math. Stal., Vol. 8 (1937) pp. 12-20. 

15] E. L. Dodd, “Regression coefficients as means of certain ratios.” Am. Math. Monthly, 
Vol. 44 (1937) pp. 306-308. 

[6] E. L. Dodd, “Index numbers and regression coefficients as means, internal and ex¬ 

ternal.” Report of Third Annual Research Conference on Economics and Sta¬ 
tistics. Cowles Commission for Research in Economics, Colorado Springs, 
Colo., pp. 13, 14. 

[7] E. L. Dodd, “Interior and exterior means obtained by the method of moments.” 

Annals of Math. Stat.\ Vol. 9 (1938) pp. 153-157. 

18] O. Suto, “Law of the arithmetic mean.” Tdhoku Math. Jour., Vol. 6 (1914-5) pp. 
79-81. 

[9] E. V. Huntington, “Sets of independent postulates for the arithmetic mean, the 
geometric mean, the harmonic mean, and the root-mean-square.” Trans. Am. 
Math. Soc. y Vol. 29 (1927) pp. 1-22. 

[10] S. Nabumi, “Note on the law of the arithmetic mean.” Tdhoku Math. Jour., Vol. 30 

(1929) pp. 19-21. 

[11] L. Tbodoriu, “Sur la definition axiomatique de la moyenne.” Mathematicae , Vol. 5 

(1931) pp. 27-31. 

[12] I. Nakahara, “Axioms for the weighted means.” Tdhoku Math . Jour., Vol. 41 

(1936), pp. 424-434. 



176 


EDWARD L. DODD 


[13] C. Gini, and L. Galvani, “Di talune estensioni dei concetti di media ai carat teri 

qualitative.” Metron , Vol. 8 (1929) pp. 3-209. 

[14] G. Pietra, “The theory of statistical relations with special reference to cyclical 

series.” Metron , Vol. 4 (1926) pp. 383-667. 

[16] G. Schiaperblli, “Come si possa giustificare V uso della media arithmetica nel calcolo 
dei risultato d' osservazione.” Rendicon to Regio Inst. Lombardo, Ser. 2, Vol. 40 
(1907) pp. 762-764. 

[16] U. Broggi, “Sur le principc de la moyenne arithm6tique. V Enseignemeni, Vol. 11 

(1909) pp. 14-17. 

[17] R. Schimmack, “Der Satz vom arithmetischen Mittel in axiomatischer Begrundung.” 

Math. Annalen, Vol. 68 (1910) pp. 126-132. 

*18] S. Matsumura, “Uber die Axiomatik von Mittelbildungen.” Tdhoku Math. Jour., 
Vol. 36 (1933) pp. 260-262. 

[19] C. Gini, “Di una formula comprensiva delle medie.” Metron, Vol. 13 (1938) pp. 3-22. 

[20] C. E. Bonferroni, “Sulle medie dedotte da funzioni concave.” Oiornale di Mat. 

Finamiaria, Vol. 9 (1927) pp. 13-24. 

[21] M. Nagumo, “Uber dine Klasse der Mittelwerte.” Japanese Jour. of Math., Vol. 7 

(1930) pp. 71-79. 

[22] A. Kolmogoroff, “Sur la notion de la moyenne.” Atti Regio Accademia Lincei, 

Ser. 6, Vol. 12 (1930) pp. 388-391. 

[23] C. E. Bonferroni, “Scadenza media e leggi di capitalizzazione.” Giornale di Mat. 

Finamiaria, Vol. 3 (1921) pp. 1-26. 

[24] C. E. Bonferroni, “Sulla scadenza media en senso lato.” Giornale di Mat. Finan- 

ziaria , Vol. 4 (1922) pp. 160-181. 

[26] C. E. Bonferroni, “La media esponenziale in matematica finanziaria.” Regio In¬ 
stitute Superiore di Scienze Economiche e Commercials Bari, 1923-24. 

[26] C. E. Bonferroni, “Della vita matematica come media esponenziale,” and Note. 

Ibid., 1924-26. 

[27] C. E. Bonferroni, “Sulle medie di potenze.” Giornale degli Economisti e Revista 

di Statistica, Vol. 66 (1926) pp. 292-296. 

[28] C. E. Bonferroni, “A proposito di espressioni generali per le medie.” Giornale degli 

Economisti e Revista di Statistica, Vol. 77 (1937) pp. 364-367. 

[29] E. L. Dodd, “Functions of measurements under general laws of error.” Skandinavisk 

Aktuarietidskrift , Vol. 5 (1922) pp. 133-158. 

[30] Dunham Jackson, “Note on the median of a set of numbers.” Bull. Am. Math. Soc., 

Vol. 27 (1921) pp. 160-164. 

[31] A. Durand, “Sur un th6oreme relatif a des moyennes.” Bull, des Sciences Math., 

Vol. 26 (1902) pp. 181-183. 

[32] O. Dunkbl, “Generalized geometric means and algebraic equations.” Annals of 

Math., Vol. 11 (1909-10) pp. 21-32. 

[33] R. Cisbani, “Contributi alia teoria della medie.” Metron, Vol. 13 (1938) pp. 23-69. 

[34] L. Galvani, “Dei limite a cui tendono alcune medie.” Boll. Un. Mat. Italiano, 1927. 
[36] E. L. Dodd, “The chief characteristic of statistical means.” Cowles Commission 

for Research in Economics. Colorado College Publication, General Series No. 
208 (1936) pp. 89-92. 

[36] E. L. Dodd, “Certain coefficients of regression or trend associated with largest likeli¬ 

hood.” Actualitks Scientifiques et Industrielles , No. 740 (1938) pp. 6-14. 

[37] G. Pietra, “Intorno alle medie combinatoric”, Supplemento statistico , etc., Ser. 2, Vol. 

4 (1938). This reference is cited in “I Contributi Italian! al Progresso della 
Statistica,” Societa Ttaliana per il Progresso delle Scienze, September 1939 on 
page 16, as dealing with the expression of power and combinatorial means, 
monoplane and biplane, as sums of power. 


University of Texas, 
Austin, Texas. 



THE PRODUCT SEMI-INVARIANTS OF THE MEAN AND A 
CENTRAL MOMENT IN SAMPLES 

By Cecil C. Craig 

The method developed by the author for calculating the semi-invariants and 
product semi-invariants of moments in samples from any infinite population 1 
is not immediately applicable to the calculation of product semi-invariants of 
the mean and a central moment in such samples. In the present paper this 
method is adapted for this purpose so that the calculation of these product 
semi-invariants becomes routine. As it will be seen, the computing is a little 
heavier than in the ease of central moments alone for results of equal weight. 
A table of results up to weight ten for the mean and the second, third and fourth 
central moments is given. The author plans to apply these to a further study 
of the sampling characteristics of the coefficient of variation and Fisher’s t in 
samples from non-normal populations. 

Let a random sample, , z t , • • • , z y of N observations be drawn at random 
from an infinite population characterized by the semi-invariants, Xi , X* , X» , • • • . 
The sample mean is, 

*-.£ Xi/N, 

4-1 

and the n-th central moment of the sample is 

m n = ]£ - £) n /N. 

4-1 

Then the product semi-invariants of order kl of x and m n , Ski{x, m n ), are defined 
by the formal identity in the parameters & and w: 

(Siod + 5010)) + —j GSiod + $oico) <2) 

a) i 

+ ~ (S»0 + &»«)'" + ■ • • - log 

in which E denotes the mathematical expectation over the set of all such 
samples and 

(<Sio<? + W r) = 2 ^ 

1 "An Application of Thiele’s Semi-invariants to the Sampling Problem;’’ Melron, Vol. 
VII, part IV (1928), pp. 8-75. 


177 



178 


CECIL C. CRAIG 


If we denote E(£ k m l H ) by Mu , we have by definition the further formal identity 
in 0 and «: 

E(e li+m *“) m 1 + (Miofl + M u o>) + |j (M l0 d + Af*iw) <s> + ••• 

in which (ilfiotf + Afoi«) (r> is to be expanded in the same manner as 
(S )0 t? -j- S w «) (r> above. 

Let us write 


and then 

( 2 ) 


5i — Xi — x, 


+-m n u^ __ jg 


(Summations with respect to i and j always run from 1 to N .) Now we define 
a new set of product semi-invariants, X r «<..., of the sum and the N 4/s, by 
means of 


(Aio# + 2Xo*w») + 2 j(Xiot? + 2Xo*a>*) (J + •••=* log f?(c Ua!l) * +2:a ‘ w *)> 


in which for example, 

( 3 \ ( 2 ) 

Xiot^ + XoiW, ) = XaxM)^ + 2Xnoot9cJi 


+ 2 Xioiu#W 2 + • • • + Xo 2 ooon + X0020W2 + X0002W3. 


We may set 


Oii — jj, i*j 

6i — 2_ aijX< with < 

f-i _ N - 1 

«** N" 


Then 


= E(c7 Xi(d '?‘ iiUl) ) = E(e a> *')-E(c atx *) ... E(e a " x "), 

in which 


Oti 


= & + 2 


OijWj. 


It follows then that 


(Xio# + SXotW») + 2j(Xiot? + SX<h'6><) (2) 


+ (Xiot> + SXo %o)i) iz) + • • • s XiSa» + Xs+ X* + • 



PRODUCT SEMI-INVARIANTS 


m 


from which 

(Xio# + SXo iUH) {k + l) ~ Xjb+J S (# + 23 0»;W,) fc+r . 

i 1 

From this 

X/too..-o 88 Xfco 28 iVX, , 

X*io...o = Xjtoio-o = • • • =0, 

and generally, 2 

(3) X*,.,= X ^'[2(-I )‘-“(N - 1) (< ] 0, + U + • • • + h - l). 

This is the first result to be used in calculating values of Su’8. Note that the 
value of is independent of the order in which a given set of I,’s occur. 

Calculation of particular - s in terms of N and the semi-invariants of 
the sampled population is both simple and rapid as one may see from a pair 
of examples: 

Xm = Xj 02 = Xsoos — • • • 

(suppressing superfluous zeros in the subscripts) 

= ^ UN - l) 8 +(N- 1)] - -“- 1 X«. 


Then, too, 


X 



Xjjr-r2 • 


For a second example: 

X*+s = ^ [—(N - l) 4 + (N — 1)* — (N — 2)] 

(N - 2)(N ! - 3JV + 3). 

= --jy*- X*+ 7 . 

Now the semi-invariants, Ski , can be expressed directly in terms of the 
product moments, PM,x s ...i jy of the sum Sfc, and the Nt’ s. These product mo¬ 
ments are given by the appropriate moment generating function: 

E (e‘ z * i)t+l,iUi ) = 1 + (not? + ZvoiUi) + i (ru* + 2v 0 *>i) W + • • • . 

* As written this result is valid if at least one of the U’b is zero which is always the 
case if N , the size of the sample, is greater than l. (Cf. the author’s paper cited above, 
p. 17.) 




180 


Cecil c. Craig 


Then it is seen that, 

U(e <Z * < } tf+ ( S, 3 ) ") as 1 + [vio<? + (2*0,ni)«] + [viO»? + (Sl'o.Bi)")* + ■ • • , 

in which 

[*10<? + (2l'0,n<)«>] < * > 

= VlD^* + 2(vi„ + VlOn + VlOOn + * * * )&*> + (vq,S* + ^00,2n + I'OOO.Jn + * * • )w* 

etc. and by comparison with (1) and (2), wo have 
(Siq& + Soico) + (iSiot^ + $oiw) (2) + • • • 

■ log |l + ~ [v, 0 tJ + ( 2 * 0 , Ju] + [•'lot? + (2* 0 ,b«)«] <2) + • • •} 

From this 

(Sio# + Soico) (k + l) 

(4) m 1 v (’~1) P ! (/>— 1) !(A? + DM + (2l/0,nt)w] r {[l'l0^ + (Svo f m)o>] (2) }*• • • 

- jys+i 2, ... (10^(21)- :: . r!«TT7 . 

in which 

r + 8 + t + . ■ • = p, 

the summation extending over all partitions (l^.S* • • • ) of k + l. This, of 
course, is only the usual formula for semi-invariants in terms of moments appro¬ 
priately modified. In particular, 

OSio# + £oi«) (2) = {[pio# + (Si/o,n»)co] <2) — [viol) + (2ro,n*)wf (. 

If we write 

[viot? + (Svo.nt)co] = W 

(5) (&•* + &,«)* - i (W (,) - 3W m W + 2W*) 

(Sm t» + Soio,) lt) - i [F w - 4 W lt) W - 30P*)* + \2W m W 2 - 6W 4 ]. 

N* 

Now the *ju,j,...j Ar ’s can be replaced by their values in terms of the \ki,i t —h/% 
the details of which will be explained below, and it will be evident that any 
vkhif-is is unaltered by a permutation of the U’s in its subscript. Taking 
account of this, the formulae (5) may be written in the expanded forms: 

Su(x, m n ) = i [* 1 „ — * 10 * 0 n] 




PRODUCT SEMI-INVARIANTS 


Su(£, m n ) = jyj [ki„ - v&ttn - 2vi h v w + 2v?ov 0 „] 

Slt(£, m„) = J^ 2 [t'l.Jn + (A'’ — l)»*lnn — P 10^0, In — (N — l)PnlW 

~ 2Nvi n V an + 2Nvvjl&n]' 

But, with no loss in generality, the origin may be taken at the population mean 
so that Xi = 0. In this case it will be found that pio — 0 and these formulae 
become: 

Su(£, m n ) * piJN 
Sn(£, m n ) — [p tn — vao^on] 

8n(x, m„) = [pi,i» + {N — l)v lnn — 2A r v 1 „v 0 r.] 

Sn(£, Ttln) = }fj [^Jn — PxPOn ~ Spi^io] 

» , . , 

8a{£) ffl„) = [*'2,2u + {N — ljl'jnn — 2N 1'2,,1'Oti — P30P0.U 

~ (N — ljl'jol'onn ~ 2 Np\ n + 2M'i 0 l'0»] 

8u(£, m n ) = [vi,3n + 3(N — l)l%2n.n + (N — 1 )(N — 2)pt nnn 

— 3Nvi,tnV0n — 3 N(N — l)pinn»0n — 3Np ln P 0 ,tn 

— 3N(N — l^inPOnT. + 

These formulae are the second result used in the actual calculation of 
Sti(£, m n y s. One begins with them, putting in the particular value of n for 
the central moment in question. If for instance we wish to compute the product 
semi-invariants of the mean and variance in samples of N, we begin with the 
set of formulae: 

SiiOc, rn») = p u /N 

^ Su(£, m*) = I*' 15 — Wot] 

&i(£, mi) = — [m + (N — l)vm - 2 AT>>i*>' M ], 


The second step is to replace the product moments which appear by 

their values in terms of the corresponding product semi-invariants. This process 
can perhaps be best explained by some examples. 




182 


CECIL C. CRAIG 


Consider the complete calculation of £u(x, Tn%). From the expression for the 
fifth central moment in terms of semi-invariants: 


vt = Xs + IOXjXj , 


we can write the corresponding expression for product moments in terms of 
product semi-invariants 

(8) (2»^) W = (2Mi) W + mZk*i) m (Zkfid w . 


Then we get pu by comparing coefficients of j—* and vm by comparing coeffi- 

cients of in this identity. For an index as low as 5, these coefficients 

Zl Zl 

are readily picked out by inspection; for larger indices the use of Hammond 
operators reduces this to a mechanical routine/ 


3 In this case we have 


Z)jZ)j(14) = (12) (02) + (03) (11). 


To the terms on the right the appropriate binomial coefficients must be applied 
giving 

3(12) (02) + 2(03)(11). 

5' 

The total of these coefficients is 5 = ^y-j, a necessary check. Then multi¬ 
plying these coefficients by 10/5, we have 

6X15X02 + 4 XosXn 

for the required coefficients in the second term in (8). Thus 


Hi = X 14 + ( 6 X 15 X 02 + 4 X 03 X 11 ). 


The two terms in parentheses arise from the same term in (8) and would both 
give rise to terms in XjXt in the final result if Xu were not identically zero from (3). 
In practice all terms in which X« is a factor are crossed out as they appear. 
Next 


DiDtim) = 2 ( 12 ) ( 02 ) + ( 111 )( 011 ) + 2 ( 021 )( 11 ). 

(X001 = X02; X012 = Xoji.) With the binomial, or multinomial coefficients attached, 
the right member is rewritten 

6 ( 12 ) ( 02 ) + 12 ( 111 )( 011 ) + 12 ( 021 )( 11 ). 


*Cf. the author, loo. cit., p. 24. 



PRODUCT SEMI-INVARIANTS 


183 


The total of these coefficients is 30 
cient by 10/30, we have 


51 

2!2!lV 


Then multiplying each coeffi* 


vm = Xrn + (2XifXw + 4XinXou + 4Xoi*Xu). 
Going on with the calculation of Sit(£, mi): 


Vli = X12, ^02 = Xot, 


and then we have: 

mi) = ^[{Xm + (N — l)Xi 22 1 


+ { 6 X 12 X 02 + (N — 1 ) (2X12X02 + 4 XmXoii) — 2 A r Xi*Xut}]. 


The first set of terms within braces gives rise to terms in X&; the second to terms 
in X 8 X*. Next 


(N - 1)(JV ! - 3N + 3). 

■ " .. N* . .. X6 

Xm = 

2N - 3. 

X 02 = 

. N* A ‘ 

(AT - l)(j\r - 2) 

m Xs 

a 

Xou — 


X14 = 

X122 — 

Xoa = 


Xon = - ( N Ni 2) X» 


_Xj 

N 

N - 1 
~N 

_Xj 

N' 




This table of values will be of frequent use in further calculations of Su’a. 
Giving the values of both Xm and Xou here, was unnecessary duplication. 

Now only the final reduction is to be carried out. We obtain 


Sn(£, mi) 


N -J 
N* 


[(iV — l)Xg + 4iVX3Xd. 


This result of order 3 and of weight 5 follows a quite mechanical procedure 
and is quite brief. The length of the algebraic computations required grows 
rapidly as the weight is increased but for weights no greater than 10 undue labor 
is not required. For greater weights only time and patience is required to get 
results if they are needed. It is to be noted that by this method one may 
calculate individual terms in the result without doing any of the work required 
for the remaining terms and that one may readily shorten the work by getting 
results to a desired degree of approximation with respect to powers of 1/N. 

There follows a table of the results so far calculated. 




184 


CECIL C. CRAIG 


For n = 2: 


= 

$.i = 
$« = 
$ji = 


N_- 1 

~N~~ 

AT — 1 
iV* 

N — 1 
AT«. 

iV - 1 
N* 


X* 

X4 

[(AT - 1)X S + 4 N\ s \ t ] 
X* 


$m = [(AT — l)Xe + 4Af(X 4 Xj + Xl)] 

s u = t 1 [(AT - l) 2 Xj + \2N(N - 1)X S X 2 + 4AT(5AT - 7)X 4 X» + 24JV 4 X s X* l ]. 
iv* 

It is not difficult to see that in general 


$*i(x, W 2 ) 


at 

jv*+* 


Xt+2 . 


For n — 3: 

„ (N — 1)(N - 2 ) 

--jyj- *4 

c (N - 1)(AT - 2) x 
«» -- A » 

&t = '• IW - 1)(AT — 2 )Xt + 9AT(AT - 2)X 6 X 2 

iv* 

+ 27N(N - 2)X 4 X 3 + 18 AT 2 X*xf] 
a _ (N — 1)(N - 2 ) , 

Sn - (*LrJWL- 2) [(AT - 1)(AT - 2)Xs + 9AT(iV - 2)X,X* 

+ S6N(N - 2)X»X S + 27N(N - 2)X? + 18iV 2 X 4 X| + 36AT 2 X?X,] 

$ 1 . = [A^(A r - 1) 2 (JV - 2) 2 X W 

+ 9(AT - 1K3AT 4 - 12iV’ + 12AT* - 5N + 5)X*X* 

+ 27AT(4AT 4 - 21AT* + 36tf* - 20AT + 3)X 7 X, 

+ 27 N\N - 2)\7N - 11)X«X4 + MN*(N - 2)(4AT- 7)X,Xj 






PRODUCT SEMI-INVARIANTS 


186 


+ 272V*(2V - 2)*(4 N - 7)X? + 542V*(2V - 2X232V - 60)X|X,X| 

+ 1622V*(2V - 2)(5N - 12)Xj\, + 54iV*(20AT* - 1262V + 140)X<Xj 
+ mN*(5N - 12)X«Xt + 3242V 4 (52V - 12)XjX*]. 

For » — 4: 

Sn « [(2V* -32V + 3)X, + 62V(2V - 1 )X,X,] 

S n = ^v "- 1 [(JV* - 32V + 3)X, + 62V(2V - IXX 4 X, + X|)] 

iV* 

S u - i ^" 1 IW - 1XAT* - 32V + 3)*X, 

+ 4N(N* - 32V + 3)(7AT* - 182V + 15)X T X» 

+ 42V(2V* -32V + 3X19AT* - 662V + 63)X,X* 

+ 42V(292V 4 - 1962V’ + 5372V* - 6392V + 351)X,X4 
+ 12Ar*(17JV* - 71 N* + 1172V - 69)X,Xj 
+ 242V*(352V’ - 1732V* + 3092V - 189)X4X,X, 

+ 722V*(2V - 2) s (3 N - 5)X| + 962V‘(42V* -92V + 6 )X,Xj] 

S n . [(2V* - 32V + 3)X 7 + 62V(2V - 1)X,X, + 182V(2V - l)X«Xd 

iv* 

&, - KJV - 1)(2V* — 32V + 3)*X„ 

+ 42V(2V* -3 N + 3X72V* - 182V + 15)X,X* 

+ 82V(2V* - 32V + 3) (132V* - 422V + 39)X 7 X, 

+ 122V(162V 4 - 1062V’ + 2852V* - 3602V + 180)X,X< 

+ 122V*(172V’ - 712V* + 1172V - 69)X,X* 

+ 42V(292V 4 - 1952V’ + 5372V* - 6932V + 351)Xj 
+ 482V*(262V’ - 1252V* + 2132V - 129)X»X.X, 

+ 242V*(352V' - 1732V* + 3092V - 189)xlx, 

+ 242V*(622V' - 3262V* + 5972V - 369)X4Xj 
+ 962V*(42V* - 92V + 6 )X 4 X,* + 2882V*(42V* - 92V-f 6 )xjxj]. 


The University or Michigan, 
Ann Arbor, Mich. 



ON THE NON-EXISTENCE OF TESTS OF “STUDENT’S” HYPOTHESIS 
HAVING POWER FUNCTIONS INDEPENDENT OF * 

By George B. Dantzig 

1. Introduction. Consider a system of n random variables Xi , x*, • • • , x» 
where each is known to be normally distributed about the same but unknown 
mean, £, and with the same, but also unknown standard deviation a. The 
assumption, H 0 , that £ has some specified value, £ 0 , e.g. £o = 0, while nothing 
is assumed about a, is known as the “Student” Hypothesis. Two aspects of 
the hypothesis Ho have been already studied extensively. If the alternatives 
with respect to which it is desired to test H 0 assume specifically that £ > £ 0 , 
(or £ < 0), then we have the so-called asymmetric case of “Student’s Hypothe¬ 
sis” and it is known, [ 1 ], that there exists a uniformly most powerful test of Ho. 
This consists in the rule, originally suggested by “Student,” of rejecting H 0 
whenever 

( 1 ) t = X ~ -*° 1 > to, 

where x and S denote the mean and the standard deviation of the observed 
Xi& and f« is taken, for example, from Fisher’s Tables [ 2 ] with his P = 2 a. 
In other words t a is such that 

( 2 ) Pit > fa I Ho} = a, 

where a is the chosen level of significance. In accordance with the definition 
of the uniformly most powerful test, whenever any other rule, R , offered to test 
the same hypothesis Ho has the same probability a of Ho being rejected when 
it is true, the power of this alternative test cannot exceed that of “Student’s” 
Test. In other words, if it happens that the true value of £ is not equal to £ 0 
but is greater, then the probability of this circumstance being detected by 
“Student’s” test is at least equal to that corresponding to the rule R. 

If the set of alternative hypotheses is not limited to those specifying the 
value of £ either greater or smaller than £ 0 , but includes both those categories, 
then it is known, [ 1 ], that there is no uniformly most powerful test of the hy¬ 
pothesis, Ho. However in this case there exists a slightly different test, also 
based on “Student’s” criterion f, possessing the remarkable property of being 
unbiased of type B \, [3]. The test, in common use for a long time, consists in 
rejecting Ho when 


1 1 1 > t a , 

186 


( 3 ) 



187 


ON 1 ‘student’s'’ hypothesis 

with t a being taken again from Fisher’s tables, this time corresponding'to his 
P = a, where a is the chosen level of significance. 

In order to describe the optimum property of this test we must use the con¬ 
cept of the power function of a test, [3]. Denote by /3(£, a) the probability of 
the hypothesis Ho being rejected when £ and <r are the true mean and the true 
standard error of the observable z,’s. The function £(£, <r) is just what is 
called the power function of the test. If we substitute £ = £o, then we shall 
have / 8 (£o, <r) = a irrespective of the value of <r. Now the optimum property 
of “Student’s” test mentioned above consists in that ( 1 ) its power function 
has a minimum at £ = £ 0 and this is true whatever be the value of <r, (2) what¬ 
ever be any other test of the same hypothesis which has the same level of sig¬ 
nificance a and has property ( 1 ), its power function 0 '(£, <r ) cannot exceed that 
of “Student’s” test. 

These two properties, demonstrating the excellence of the criterion suggested 
by “Student,” fully justify the general confidence in the test as described above, 
or in its extended form where it is applied to two or more samples. However, 
it is known that “Student’s” test in both its forms, t > t a , and 1 1 1 > t a , has 
one very undesirable property which causes great difficulties in various problems 
of rational planning of experiments. 

One of the most important questions to have in mind when planning an 
experiment is: What is the probability that the experiment and the subsequent 
statistical test will detect a difference or effect when it actually exists? If we 
perform an experiment and then apply some statistical analysis to test 
“Student’s” hypothesis that £ = £ 0 , we do hope that, if the actual value of £ 
is different from £ 0 , the test will discover this circumstance. But apart from 
mere hope, it is desirable to take precautions so that when the difference, 
£ — £o = A, has some appreciable value, the chance of the hypothesis Ho being 
rejected will be reasonably large. This may be done by calculating the value 
of the power function £(£, a) corresponding to the value { = £ + A. And 
here we come to the unfortunate property of “Student’s” test. 

Although the form of the power function of “Student’s” test is known and 
tabled [4], [5], [ 6 ], [7], there are occasionally considerable difficulties in applying 
these tables, because it appears that the values n and A are not all its arguments, 
for it also depends on <r. Consequently in order to have an idea of the proba¬ 
bility that the test will detect the falsehood of the hypothesis Ho that £ = £o 
when actually £ = £ 0 + A we need not only the knowledge of n but also a 
likely value of a. The latter is known accurately only in exceptional cases and 
then in those cases one would apply a test which is different from “Student’s” 
test. Usually we have only a vague notion of the magnitude of <r and accord¬ 
ingly the tables of /?(£, e) may be used to obtain a rough idea as to whether 
the arrangement of the experiment planned is satisfactory or not. Frequently 
we have no idea of what may be the values of «r. 

To Dr. P. L. Hsu is due the idea of looking for tests, the power of which is 
independent of the parameters unspecified by the hypothesis tested. In an 



188 


GEORGE B. DANTZIG 


unpublished paper, he proved among other things that the X test of the general 
linear hypothesis is the most powerful of all those, the power function of which 
depends on the same argument as that of the X test and not on other parameters. 
The above circumstances suggest the following problem: to see whether it is 
possible to devise a test of “Student’s” hypothesis such that its power function 
would be independent of <r. If such a test could be devised and proved to be 
reasonably powerful then the tables of its power function could be used for the 
purpose of planning experiments. 

The purpose of the present paper is to show that no such test exists and, 
consequently, this negative result implies in still another way that it is im¬ 
possible to improve on the test originally suggested by “Student.” 

2. Statement of the Problem. The problem of finding a test whose power 
function is independent of <r is equivalent to finding a critical region w such 
that the value of the power function 

(4) 0((,<r) = P{Etw\i,a\ 

for any fixed f is independent of the value of <r, where E denotes the sample 
point (xi, Xt , • • • x n ). We shall show specifically that if this is the case, then 
the power function is also independent of £; so that the test will reject the hy¬ 
pothesis tested with the same frequency independently of whether it be correct 
or wrong. 

3. Theorem. If there exists a region w such that, whatever be the value of c, 

( 5 ) (vfe) / 

to 

(6) (v/fca) / '** / 

1 0 

where & ^ ft, a, 0 are constants, then 

(7) « = fl. 

A region w is called similar [1] to the whole sample space, W, of size a, with 
respect to a set of elementary probability laws p(E j 0) given in terms of a 
parameter 0, if P\E tw | 0) = a, whatever be the value of 0. Essentially, 
then, the region, w, above is a similar region with respect to two different sets 
of elementary laws each being given parametrically in terms of the parameter a. 

n 

Denote by w r the portion of the surface of the hypersphere, £ (*< — io)* * r *> 

4-1 

which is common to w, and let the total surface be denoted by W r . Neyman 
and Pearson have shown [1], that a necessary and sufficient condition that w 
be a similar region, in the above case, is that, whatever be r, the probability 



on “student’s” hypothesis 


m 


that the sample point E will fall on the subsurface w, , when it is known that 
the sample point lies on the surface W, is a, i.e. 

(8) P{Etw,\ (EtWrKt - &)} - « 

for all r. 

In a similar manner let w p denote the portion of the surface of the hyper- 

n 

sphere 2 ( x < £i) 2 =* P common to w y and let the total surface be denoted 

by W p . ijjince w is similar to the set of probability laws indicated in (6), we 
have also 


(9) 


P{E,Wp\(EiWp)(i^ ^)) = 0 


for all p. 

Since on the surface W r , the elementary probability law, 


( 10 ) 






r* 

e 


is constant, we see that an equivalent statement of (8) is that the hyper-area of 
w r is a constant proportion , a , of the total hyper-area W r . Similarly, from (9), 
we have that the hyper-area of w p is a constant proportion , 0, of the area of the 
hypersurface W p , whatever be the values of r and p. 

Consider the transformation which expresses X \, x *, • • • x n in terms of gen¬ 
eralized polar coordinates with pole at the point (&,&>«", $o), i.e. 


x\ — £ 0 = r cos 0 2 cos 0 8 • • * cos 0 n _a cos 0 n _i cos 0 n 


x* — £ 0 = r cos 0 2 cos 0* • • • cos 0 n -* cos 0 n _i sin 0 n 


(id 


Xt — {o = r cos 0j cos 0» ••• cos 0„_ s sin 0„_i 


x,_i — & = r cos 02 sin 0 8 
x„ — & = r sin 02 

Let A be the Jacobian of the transformation: 

(12) | A | = r- 1 ft cos' 0 B+t -, = r*- 1 T(0,). 

<-2 

Consider also a transformation which expresses (xi, , • • • x») in terms of polar 

coordinates, the point ($i, (i, ■ ■ • , $ 1 ) being pole. It may be obtained by 
replacing in (11), & by , r by p, and 0< by 0,- . The Jacobian of this trans¬ 
formation is given by | A | = p n ~ 1 7 , (0 i ). 

We are now able to express the hyper-area of W r : 

JJ | A|d 02 d0« ... d0» - r" _1 f f r( 0 <)d 02 d 0 s •••(»»« Ifr" -1 , 


(18) 



190 


GEORGE B. DANTZIG 


where the integral K > 0 is a constant independent of r. Similarly the hyper¬ 
area of W p is Kp n ~ l , where K is the same as in (13). According to (8) and 


(9) we have, now 


(14) 

j J | A | dB% d$i • • - dd n = a*K>r n 

Wf 

(15) 

J j 121 dBi dS$ • • • dS n ~ &*K*p n 


Let us consider the distances between the three points: (xi , xj, • • • , x„), 
(to i £o i • • • , £#), and (£ t , £i , • • • , £i). The distances of the first point to the 
second point and to the third point we have already denoted by r and p. Let 
the distance between last two be L, then, since the sum of two sides is at least 
equal to the third side of a triangle, we have 

(16) rs£p + L, p £> r + L, where L — y/N | & — £i |. 

Let <p(t) ^ 0 be an arbitrary monotonic nonincreasing function of t, such that 
the product < B- V(0 is integrable from 0 to + «. Since <p(t) is a decreasing 
function it follows from (16) that 


(17) <p(r) £ ip(p + L) and <p(p) g <p(r + L). 

Consider the integral I: 


(18) 


I - J J <p(f) dXl dx J • • • dx n . 


We shall express it in terms of the variables r, Oi, ■ ■ ■ , 0 n and also in terms of 
P, 8* , * • • K and compare the results. Thus 

I — J J | A | <p(r) dr d0 j ••• d8 n 


(19) 


= <p(r) dr J J \b\dO t "-d6n 

v> r 

= a-K' J r n 1 <p(r)dr. 


Also we have by (16) 

I - yy* | A | <p(r) dpdOt • • - dO n 



on “student’s” hypothesis 


m 


(20) *// 121 ^(p + L) dp d &i • • • d$n 

w 

gc j[ ,(p + L) dp JJ 121 d0% • • • ddn 

and consequently 

( 21 ) 7 Zp.K £ p'-'Ap + L) d P . 

Since K > 0, we have from (19) and (21) 

(22) a ^~l <" _1 <p(t + L) dt j £ t n ~ x <p(t)dt. 

By interchanging p and r in (18), (19), (20), and (21) we have also 

(23) /3/a £ jf t n ~\{t + L)dt j jf <" _1 <p(.t) dt. 

Let us set in (22) and (23), <p«) = and <p(l + L) — e~ pL e~ pt where p > 0 
is arbitrary. Then 

(24) a/,3 > e~ pL and /3/a £ e~ pL . 

Since (24) holds for all p > 0, let p approach zero. Then Lim e~ ptl = 1, and 
the above inequalities can hold only if 

(25) a = /3, Q.E.D. 

It is of interest to note that there do exist regions such that the power func¬ 
tion is independent of both £ and a. For example, let S n be the standard 
deviation of the observed values (xi, x*, • • • , x n ) and let jS„_i be the standard 
deviation of the values (xi, x*, • • • , x„_i), then the region w given by all 
points (xi , Xt , ■ • • x n ) which satisfy the inequality (S n -\/S n ) § C is such a 
region, i.e. 

(26) ' P(0Sn-i/<S„) £ C\lc\ 

is constant, whatever be the values of { and a. Such regions are, however, 
unsuitable for testing “Student’s” hypothesis Z = (o, because they will reject 
this hypothesis when it is wrong and when it is correct with equal frequency. 

The author is indebted to Professor J. Neyman for assistance in preparing 
the present paper. 

REFERENCES 

[1] J, Neyman and E. S. Pearson, “On the problem of the most efficient tests of sta¬ 
tistical hypotheses,” Phil. Trans. Roy. Soc. London, Vol. 231(1933), pp. 289-337. 
{2] R. A. Fisher, Statistical methods for research workers. Oliver A Boyd, 7th edition, 
London, 1938. 



192 


GEORGE B. DANTZIG 


[3] J. Neyman, "Sur la verification des hypotheses statistiques compos6es,” Bull . Soc. 

Math, de France , T. 63 (1936), pp. 246-266. 

[4] S. Kolodzibjczyk, "Sur l’erreur de la deuxi^me categoric dans le probl6me de 

'Student/ ” Comple Rendus. T. 197(1933), p. 814. 

[6] J. Neyman with co-operation of K. Iwarzkiewicz and S. Kolodziejczyk, "Statistical 
problems in agricultural experimentation/’ Suppl. Jour. Roy. Slat. Soc., Vol. 
11(1935), pp. 107-80. 

[6] J. Neyman and B. Tokarska, "Errors of the second kind in testing ‘Student's' hy¬ 

pothesis,” Jour . Am. Stat. Ass. } Vol. 31(1936), pp. 318-26. 

[7] P. C. Tang, "The power function of the analysis of variance tests with tables and 

illustrations of their use,” Stat. Res. Memoirs , Vol. 11(1938), pp. 126-67. 

University of California, 

Berkeley, California. 



A METHOD FOR RECURRENT COMPUTATION OF AIL THE 
PRINCIPAL MINORS OF A DETERMINANT, AND ITS 
APPLICATION IN CONFLUENCE ANALYSIS 


By Olav REIERB0L 

1. Recurrent computation of all the principal- minors of a determinant. 

The formulae which I develop in this paper have been worked out for use in 
statistical confluence analysis. By means of recurrent computation they shorten 
considerably the amount of work required to compute all principal minors of a 
square matrix. Originally I elaborated this method as a simplification of one 
given by Frisch (not published). 

Subsequently I found that the method could more easily be deduced from the 
pivotal method. This method has been described, for example, by Whittaker 
and Robinson [5] and by Aitken [1]. 

Let us consider a square n-rowed matrix 



Oil 

an 

• • • au 

(1) 

an 

an 

• • • Oln 


a n i 

On! 

• • • Own 


Let the adjoint of this matrix be || p,,- || and let us denote its determinant 
value by D w ... n . 

Then we have the following identity 


( 2 ) 


P»—l,n—1 P*-l,n | ^ ^ 

= Du...»Du...«-t. 

Pn,n-1 Pn,n 


As Aitken points out, the pivotal method is based upon this identity. 

Next consider the following matrix which is formed from the matrix (1) by 
striking out the nth row and the (n — l)th column: 

Oil • • • Oi,»—* 01,« 


On-1,1 • • • On—l,n—t On—|,n 
On—1,1 • • • On—l,n—t On—l,n 

103 





194 


OLAV REIERS0L 


Let us denote its adjoint by || ?</ Il» its determinant value by A . 
The determinant 


Oil • • • Oi fW _s Ul,n-1 

Oft_s,l • • • On—J,n—1 On—2,n—1 ““ t 

On, 1 * * • On,n—J On,n—1 


we shall denote by Bi 2 ...» • 

The identity (2) can now be written 


(2') 


D\i... n = 


-Dl 2 ...n- 2 ,n-Di 2 .. .n- 2 ,n-l ~ An.. . , n 


Du-'n- 2 

If we apply the identity (2) to the matrix (3) we get 

(fn— 2,n—2 (7n—2,n—1 

Q'n—l,n—2 </n-l,n-l 

which may also be written 

An .. -n—3,n—l,n D„.. . n—2 <4.12•••»!—8,n—2,n 1^12* • *n—1 


= An.. - n Bn • • • n -8 > 


(4) 


Di2- ■ n -8 


To simplify the notation we will not write the affixes present, but write the 
affixes not present in inverted parentheses. Then our formulae (2') and (4) 
can be written 


D - 


_ D)n-l( D)n( AB 


D)n-l,n( 


A as ^ )n-^) n-l,n( ~ A) n -l(B) n ( 

/^)n- 2 ,n-l,n( 

In an analogous way we get 

5 = B )n—t( D) n—l,n( B)n~l( 4 )n( 

We may apply these formulae to an arbitrary principal minor D VlV% ... Vk . 
Let us now denote D Vlt , a ... Vik by D and denote the absence of one or more of the 
numbers V \, v *, • • • Vk by placing them into inverted parentheses. We then 
have the formulae: 


(5a) 

(6b) 

(6c) 


a sr n 

" . n » 

An-I .**( 





COMPUTATION OP DETERMINANTS 


195 


By means of these formulae we can recurrently compute all principal minors. 
We begin with Di * an, i * 1, 2 • • • n, An » o</, Bn ■« an , where i < j. 
Then we compute the D’b with two affixes, 

Dn = DJDj — AnBn , 

and then the quantities A, B, D with three affixes, 

Aijk — AjkDi — AitBn 
Bijk — Bj k Di — BikAij 

n DikDij — Am, ^ i ^ v 

-^-, i<j<k. 

Then we compute the quantities A, B, D with four affixes, and so on. 

If we carry through the computations without dropping any figures we have 
as a control that all divisions will be exact without remainder. If we are 
dropping figures we can control the result by computing the determinant 
in another way. If we wish to control the computation before it is com¬ 
pleted, we may use our recurrence formulae on the matrix which we get from 
the original matrix when the rows and the columns are subjected to the same 
permutation. For example we can reverse the order of the rows and columns. 
Then we can control the (k — 1) rowed minors before computing the fc-rowed 
minors. 

If all the D’s are different from zero, we may reduce the necessary number of 
multiplications and divisions in the following way. We introduce the following 
notations: 


d = 


D 


A 

a =- 

_ _ b_ 

d)»»< 

Substituting in (5), we get the following system of recurrence formulae: 


(6a) 

d — (C)Vk( 

(6b) 

b = &)•*-,< + d)v k ( 

(6c) 

6 

c = - -— 


(6d) 

d = d) 9k ^ x ( + gc 

(fie) 

D m D) 9h (d, 


B 

B)*k-i .»*< 



196 


OLAV REIERS0L 


An affix Vk on a letter indicates the deletion of the last row and column in the 
determinants making up the definition of that letter, even though those deter¬ 
minants are of lower order than . Similarly, an affix Vh-i indicates the dele¬ 
tion of the next to the last row and column. 

The a's with two affixes in these formulae are identical with the elements aa 
of the matrix (1) where i < j. Further, 6*, = , i < j , di =» an . Applying 

the recurrence formulae (6) we start with these values. 

If the matrix (1) is symmetric, i.e. if , then we get 

and 

33 ’-Vk • 

In this case we can therefore replace B by A in the formulae (5) and replace b 
by a in the formulae (6). 

Numerical example. Let us compute all the scatterances in the constructed 
example given by Frisch, [3, p. 121]. The correlation matrix in this example is: 


1.000000 

-0.121551 

0.656809 

0.752502 

-0.224549 

-0.121551 

1.000000 

0.657698 

-0.732862 

0.212165 

0.656809 

0.657698 

1.000000 

0.014385 

-0.040183 

0.752502 

-0.732862 

0.014385 

1.000000 

-0.280223 

-0.224549 

0.212165 

-0.040183 

-0.280223 

1.000000 

Using our recurrence formulae (6) we get the following table: 


a 

c 

d 

D 

12 

-0.121 551 

0.121 551 

0.985 225 

0.985 225 

13 

0.656 809 

-0.656 809 

0.568 602 

0.568 602 

23 

0.657 698 

-0.657 698 

0.567 433 

0.567 433 

14 

0.752 502 

-0.752 502 

0.433 741 

0.433 741 

24 

-0.732 862 

0.732 862 

0.462 913 

0.462 913 

34 

0.014 385 

-0.014 385 

0.999 793 

0.999 793 

15 

-0.224 549 

0.224 549 

0.949 578 

0.949 578 

25 

0.212 165 

-0.212 165 

0.954 986 

6.954 986 

35 

-0.040 183 

0.040 183 

0.998 385 

0.998 385 

45 

-0.280 223 

0.280 223 

0.921 475 

0.921 475 

123 

0.737 534 

-0.748 594 

0.016 489 

0.016 245 

124 

-0.641 395 

0.651 014 

0.016 184 

0.015 945 

134 

-0.479 865 

0.843 938 

0.028 765 

0.016 356 

234 

0.496 387 

-0.874 794 

0.028 677 

0.016 272 

125 

0.184 871 

-0.187 643 

0.914 888 

0.901 371 

135 

0.107 303 

-0.188 714 

0.929 328 

0.528 418 

235 

-0.179 723 

0.316 730 

0.898 062 

0.509 590 

145 

-0.111 249 

0.256 487 

0.921 044 

0.399 495 

245 

-0.124 735 

0.269 457 

0.921 272 

0.426 516 

345 

-0.279 645 

0.279 703 

0.920 167 

0.919 977 



COMPUTATION OP DETERMINANTS 


197 



a 

c 

d 

D 

1234 

0.000 279 

-0.016 6 

0.016 179 

0.000 262 83 

1235 

-0.031 090 

1.885 5 

0.856 268 

0.013 910 

1245 

0.009 105 

<-0.562 6 

0.909 766 

0.014 506 

1345 

-0.020 692 

0.719 35 

0.914 443 

0.014 957 

2345 

0.032 486 

-1.132 8 

0.861 262 

0.014 014 

12345 

0.009 621 

-0.594 7 

0.850 546 

0.000 223 55 


2. Computation of the coefficients of the characteristic polynomial of a 
matrix. Tim characteristic polynomial of the matrix (1) is 


; 

(in — x 

dl* 

• • • d\ n 

P(X) = 

Oai 

dn — X 

••• din 


dnl 

dnl 

a»n — 


- P„ - Pn-lX + P^X*-+ (- 1)"X*. 

As is well known, the coefficient P* can be calculated as the sum of all the 
Arrowed principal minors of the matrix (1). Our method of computing all the 
principal minors of a matrix therefore gives us as a by-product a method of 
computing the coefficients of the characteristic polynomial. Another method 
for the determination of these coefficients has been given by Paul Horst [4]. 

We may obtain a comparison between the work of computation entailed by 
the two methods by calculating the number of multiplications and divisions 
necessary when using one or the other method. If our recurrence formulae (6) 
are used, two multiplications and one division are necessary for computing a 
2-rowed minor, and 4 multiplications and one division for every minor with 3 
or more rows. Consequently the total number of multiplications and divisions 
will be: 



= 5.2" - (n* + 4» + 6). 


On using Horst’s method, the number of necessary multiplications and divi¬ 
sions will be found to be 

= (jn — l)n* + Jn 1 + |(n — l)(n + 2) 

H n = £(w — l)(n* + n + 2) n even, 

H* = i(» - l)(n* + n* + » + 2) 


n odd. 




108 


OLAV REIEHS0L 


When n = 2, 3, • • • 12, 8 n and H n acquire the following values: 


n 

S n 

Hn 

2 

3 

, 6 

3 

14 

41 

4 

43 

105 

5 

110 

314 

G 

255 

560 

7 

558 

1203 

8 

1179 

1827 

9 

2438 

3284 

10 

4975 

4554 

11 

10070 

7325 

12 

20283 

9581 


We see that our method of computing the coefficients of the characteristic 
polynomial involves less calculation when n < 10, while Horst's method is su¬ 
perior when n 10. 

If our purpose is to find the characteristic roots of the matrix we can do this 
with less amount of computation without first finding the coefficients of the char¬ 
acteristic polynomial. See Aitken, [2]. 

3. Applications in confluence analysis. The confluence analysis of Frisch is 
set forth in his book: “Statistical Confluence Analysis by Means of Complete 
Regression Systems/' [3]. 

The main method of this book is the “bunch analysis/' which includes the 
computation of the adjoints of the correlation matrices of all sets of variates 
contained in the total set. In section 1, Frisch has described a preliminary 
analysis by means of scatterances. The scatterances are the principal minors 
of the correlation matrix of the total set of variates. If we carry through such 
an analysis, the recurrence formulae of section 1 of this paper will give a rapid 
method for the calculation of all the scatterances. 

Another application of the computation of all the scatterances arises in the 
determination of the correct time lags between Variates in a structural equation. 
This problem will be treated in a paper on confluence analysis which will appear 
in the near future. 

REFERENCES 

[1] A. C. Aitkbn, 4 ‘On the Evaluation of Determinants, the Formation of their Adjugates, 

and the Practical Solution of Simultaneous Linear Equations/’ Proc. of Edin¬ 
burgh Math. Soc. f Second Series. Vol. 3 (1932-33), p. 207. 

[2] A. C. Aitken, “Studies in Practical Mathematics II,” Proc. Royal Soc. of Edinburgh , 

Vol. 57 (1936-37), p. 269. 

[3] Ragnar Frisch, “Statistical Confluence Analysis by Means of Complete Regression 

Systems,” Publication no. 6 from the University Institute of Economics, Oslo, 1934. 
[41 Paul Horst, “A Method for Determining the Coefficients of a Characteristic Equa¬ 
tion,” Annals of Math. Stat., Vol. VI (1935), p. 83. 

[5] Whittaker and Robinson, The Calculus of Observations, London, 1924, 

Institute of Economics, 

University of Oslo, 

Oslo, Norway. 



NOTES 

This section is devoted to brief research and expository articles , notes on methodology 
and other short items. 


A CRITERION FOR TESTING THE HYPOTHESIS THAT TWO 
SAMPLES ARE FROM THE SAME POPULATION 

By W. J. Dixon 

1. Introduction. The purpose of this paper is to consider a criterion for 
testing the hypothesis that two samples have been drawn from populations with 
the same distribution function, assuming only that the cumulative distribution 
function common to the two populations is continuous. Let the two samples, 
O n and O m , be of size n and m respectively. We may assume n < m without 
loss of generality. Suppose the elements u x , • • • , u n of O n are arranged in order 
from the smallest to the largest, that is, < u% < • • • < u n . These may be 
represented as points along a line. The elements of 0 m represented as points 
on the same line are then divided into (n + 1) groups by the first sample, O n . 
Let m\ be the number of points having a value less than Ui , m, the number 
lying between w* and u,+i, (i = 1,2, • • • , n) and m n +i the number greater than 

The criterion here proposed is 1 

- 

m) * 


u n , (m n+1 = m - m x - m* - 
(1) C 2 


• • • — W n ). 

w+1 ✓ i 

S (n-h 


1 A similar criterion 



for two samples of the same size was investigated (unpublished) by A. M. Mood. He 
found the mean and variance to be 


E(d>) 


2n + 1 
3n 1 


* 


It - 


8(n - l)(2n + 1) 

45n* 


It can be seen that this is the sum of the squares of the differences between the ordinates 
of the two cumulative sample distributions calculated at the jumps of the first sample 
distribution. 


100 



200 


W. J. DIXON 


2. The mean and variance of C ! . The only case of continuous cumulative 
distribution functions F(x) of any interest in statistics is that in which dF(x) = 
f{x) dx, where/(x) is a probability density function. Let us write: 



where of course p n +i =1 — pi — p* — ••• — p„. 

Now, the joint distribution law of the pi is 
(2) P(pi, • • • , p H ) = n! dpi • • • dp» 


and the conditional distribution of the m< given the is 

m! 


(3) P(mi, I Pi, ‘'-,Pn) = 


mi 


! ... 


m* + i 


»»1 *»2 «‘nfl 

,Pl P2 * ' * Pn + 1 - 


Therefore the joint probability law of the m* and p* is 

n\m\ 


(4) 


P(m,p) = 


p? pT • • • p»+V dpi • • • dp n . 


mi!... m, + i! 

Let <p(0) = fln+i) = E ("exp £ 0,(- - —Yl; then 

L <-i \» + 1 m/J 


(5) 

(6) 
and 

(7) 


w^-gSL+SiSSsL 

*’("> - 2- / e *p [§ * (dn “ S)] p< ”' p) ’ 


where denotes the usual multinomial summation over all integral values of 
rrii > 0 for which = m and the integration is over the generalized tetra¬ 
hedron defined by Pi > 0 and p% + p% + • • • + p »+1 <1. If we perform 
the summation first, we obtain 


( 8 ) 


<p($) = n\e 


"+ 1 ii 

<-i «+i 


J ( pie" 


_°n+l 

+ * • * + pn+i# m ) m dpi • • • dp n . 


Differentiating twice with respect to 0* and setting the 0’s equal to zero, we get 


d 2 <p 

de} 


L. - / [(dj + (s " di) 1 * + dr*’} ip ‘ ’ "■ ■ 


If we now integrate and sum from one to n + 1, we find 



PROBLEM OV TWO SAMPLES 


Performing the operations indicated in (6), we obtain 2?[(C*) ! ] from which' we 
subtract [E(C*)]* and have as the variance of (?, 

% 4m(m — 1) (tn -J- n 4* 1) (in H- n ■+• 2) __ 


re* 


m*(n + 2)*(» + 3)(n + 4) 


3. Significance values of C 2 . If we let C« be defined as the smallest value 
of C 2 for which P(C 2 > C 2 a ) < a then we can compute the value of C\ fairly 


TABLE I 

Values of Ca • « =* 0.01, 0.05, 0.10 


\» 
m \ 

2 

3 

4 

5 

6 

7 

8 

9 10 

4 


— 

.800 






5 



.800 

.833 







.750 

.800 

.833 

.857 




6 


.750 

.800 

.833 

.857 






.750 

.800 

.556 

.413 








.833 

.867 

.875 



7 


.750 

.800 

.588 

.612 

.467 




.667 

.750 

.555 

.425 

.449 

.426 






.800 

.833 

.857 

.656 

.670 


8 


.750 

.800 

.594 

.482 

.469 

.389 



.667 

.531 

.425 

.413 

.367 

.375 

.358 





.800 

.833 

.660 

.677 

.543 

.554 

9 


.750 

.602 

.448 

.413 

.431 

.395 

.381 


.667 

.552 

.454 

.389 

.363 

.356 

.321 

.307 




.800 

.833 

.677 

.555 

.549 

.480 .449 

10 

.667 

.750 

.480 

.493 

.437 

.415 

.349 

.340 .349 


.487 

.430 

.380 

.373 

.357 

.315 

.309 

.280 

readily for small values of 

m and 

n. The values of C 2 for tn, n 

< 10 are given 


in Table I for a 

continuous the probabilities P(C 2 > C«) will, in general, be less than a. 



302 


W. J. DIXON 


It will be seen that if m and n increase indefinitely in the ratio n/m * y, 
then »C* converges stochastically to y + 1 whereas nC s ranges from 0 to 
»*/(» +1) which indicates a tail to the right. This suggests that for larger 
values of m and n, it is reasonable to try to fit the distribution of nC* by the 
method of moments using a distribution of the form 

( 11 ) 

2* r(i*) 

which has 


E(x 2 ) « 


k' 



Setting x l = nC 2 , we see that we can consider nkC 2 distributed as x with v 
degrees of freedom. Of course, v is not necessarily an integer, but x 2 tables 
may be used for approximate values of the probability that nkC 2 will exceed 
certain values, 2 or the values of nkC 2 that will be exceeded a certain per cent 
of the time.* More exact values of these probabilities that nkC 2 will exceed 
a certain value may be found from a table of the incomplete Gamma function. 4 

To calculate k and v directly, the following formulas obtained by equating 
the mean and variance of (11) to the mean and variance of nC 2 may be used: 


(12) k = am(n + 2 )/n, v = an(n + m + 1 )/(n + 1), 


where 

_ jw(n + 3 )(n + 4)_ 

2(m — l)(wi -(- w + 2)(n *f* 1) 

If the fitted curve (11) is used to obtain significance values of nC 2 , there is a 
tendency toward rejecting slightly over 100a%, especially for small values of 
m and n. The error is probably due to fitting a curve having an infinite range. 
The discrepancy decreases as m and n increase. 

The goodness of fit at the 0.01, 0.05 and 0.10 significance levels was tested 
for two cases. 

Case 1. n = 9, m = 10; nk = v = fy. 

The exact distribution in the region under consideration is the following: 


Cl 

... .26 

.28 

.30 

.32 

.34 

.36 

.40 

.42 

.44 

.48 ... 

P(C* > Cl) 

... .121 

.090 

.082 

.072 

.037 

.033 

.026 

.025 

.016 

.007 ... 


The values of C« from the fitted curve are C.oi = 0.422, C,o 5 = 0.323 and 
C?io = 0.277. The double rule indicates the divisions (from the fitted curve) 
for a *= 0.01, 0.05 and 0.10. 


1 Karl Pearson, Tables for Statisticians and Biometricians , part 1, Table XII. 
* R. A. Fisher, Statistical Methods for Research Workers , Table III. 

4 Tables of the Incomplete Oamma Function , Biometrika Office, London. 




PROBLEM OP TWO SAMPLES 


203 


Case 2. n * 12, m ■* 12; nfc *= 65.068, v «■ 8.938. 

The important part of the exact distribution for our purposes is: 


Cl 

ma 

.229 .243 .266 

.270 ... .326 

.340 .364 .881 ... 

Pic* ^ CJ) 

... .120 

.109 .078 .067 

.046 ... .017 

.014 .011 .009 ... 


The values of C\ from the fitted curve are C*i = 0.3315, C.oi = 0.2587 and 
C*o = 0.2244. 


4. Examples. 1. Two samples of ten members each are drawn and it is 
desired to test, using a rejection region of size a, the hypothesis that these two 
samples could have originated from the same population about which nothing 
is assumed except that it is continuous. The first sample was found to divide 
the second sample into the following groups: 0, 0, 0, 3, 0, 4, 0, 0, 2, 1, 0. 

C* = (A - A) 2 + (A - A) 2 + (A - A) 2 + (A - A) 2 + 7(A)* = -209 

which we see from Table I is not a significant value even for a =» 0.10 since 
C*, 0 = 0.269. 

2. A sample of 15 divides a second of 25 into the following 16 groups: 0, 1, 
0, 0, 5, 4, 1,3, 9, 0, 0, 1,0, 1,0, 0. 

C 2 = (A — A) 2 + (A — A) 2 + (A — A) 2 + (A — A) 2 + 4(A — A) 2 + 8(A)* 

nC 1 = 2.302 k - 7.511 v - 10.19 
nfcC* - 17.295 

which gives a significant value for a = 0.10 but not for a = 0.05, since nkCfu ** 
16.233, nkC% t = 18.568. Actually P(nfcC* > 17.29) = .077. 


5. Remarks. If we set W equal to the number of m, which are zero and 
V = n + 1 — W then V is the number of non-zero ; further, 2V ^ U where 
U is the total number of runs, the criterion proposed in the paper of Wald 
and Wolfowitz in the present issue of the Annals of Mathematical Statistics. 
Now, 

«+i 

(13) W = lim 

*li■•*.**+ 

so that, setting 

(14) • - £./ exp [g * (,-ij - =)] % 


analogous to (7), we have 

E{WC') 
















904 


JOHN W. MAUCHLY 


from which we can find 


2n(l — rn) 

Prc ta v a c* = rnln + i^nT+n) 
and 


1 ^ s _ (n + 3)(w + 4) ( to + n - 1) 

Ptrc* — Prc* ( n _j_ + n + 1)(to + n + 2)' 

If n/m — y (a fixed constant) and n is large 

s n 

P —-• 

n + to 

p will be near 1 when n is much larger than w. This corresponds, in com¬ 
puting C 2 , to dividing the smaller sample into subgroups by the larger. In 
this case U and C 2 give essentially the same information. When to and » are 
more nearly equal the two criteria are quite different. For n > to, C 2 has 
fewer possible values than for n < to, and is therefore a more sensitive test 
when » < to. 

While it is doubtful that thus test is biased for large samples, this question 
will not be considered in the present note. 

Princeton University, 

Princeton, N. J. 


SIGNIFICANCE TEST FOR SPHERICITY OF A NORMAL n-VARIATE 

DISTRIBUTION 

By John W. Mauchly 

1. Introduction. This note is concerned with testing the hypothesis that a 
sample from a normal n-variate population is in fact from a population for 
which the variances are all equal and the correlations are all zero. A popula¬ 
tion having this symmetry will be called “spherical.” Under a linear orthogonal 
transformation of variates, a spherical population remains spherical, and conse¬ 
quently the features of a sample which furnish information relevant to this 
hypothesis must be invariant under such transformations. 

A situation for which this test is indicated arises when the sample consists 
of N n-dimensional vectors, for which the variates are the n components along 
coordinate axes known to be mutually perpendicular, but having an orientation 
which is, a priori at least, quite arbitrary. A specific application for two 
dimensions, treated elsewhere {1], may be mentioned. Each of N days fur¬ 
nishes a sine and a cosine Fourier coefficient for a given periodicity, and these, 
when plotted as ordinate and abcissa, yield a somewhat elliptical cloud of N 
points. The sine and cosine functions are orthogonal, and their variances have 




TEST fOB SPHERICITY 


equal expectancies for a random series. The arbitrary nature of the orientation 
of axes appears here as the arbitrary choice of phase, or origin of time. Of the 
five ellipses studied, three could easily have come from circular populations 
(random), and two showed highly significant ellipticity. 

2. Likelihood ratio criterion for sphericity. The method of Neyman and 
Pearson [2] will be used to derive a test criterion which seems entirely suitable. 
Let 12 be the class of all normal n-variate populations, and let u be the subclass 
of all normal n-variate populations satisfying the hypothesis of “sphericity.” 
The likelihood ratio criterion is obtained by taking the ratio of the maximum 
of the likelihood for variation of all population parameters specifying u, to the 
maximum of the likelihood for variation of all population parameters speci¬ 
fying 12. That is, 

, x _ P(« max) 

U; * P(12 max)' 

For the set 12, the probability law for a single observation of the n variates 
may be written: 

(2) P = K 1 0,7 r» e~'h (i, j m 1 , 2 ... n), 

where c-a is an element of the matrix || a,, || -1 , the a<> being variances and 
covariances, a, is the mean value of the variate in the population, and K is a 
constant the value of which does not concern us here. Then a sample of N 
from 12 has the probability, 

( 3 ) p = k n \ o,7 r* y 

Letting 

N N 

(4) X) iTitt = N'JCi and ^ 1 (^Wnt ^;) ^ N 8jj , 

a-* l a«*l 

differentiating the logarithm of P with respect to the parameters o< and o<y, 
and setting these derivatives equal to aero, the maximum likelihood estimates, 

(5) &i * $i J s Bij , 

are obtained. Substituting these values in equation (3) we find that the maxi¬ 
mum value of the likelihood is 

(6) P(12 max) - K s \ f*V‘\ 

The derivation of P(w max) proceeds upon similar lines, but is simpler, for 
the probability law for the set w is obtained from (3) by setting 

( 7 ) Cn *= c&tj , 



206 


JOHN W. MAUCHLY 


where c is any positive constant, and = 0 if i j and 1 if» = j. The result 
is found to be 

(8) P(« max) = /f w (« 0 )"* w V* w 

where «o is defined by 

w 

(9) ns 0 = 2 - 


The likelihood ratio criterion is therefore 



It will be convenient to designate the Nth root of this statistic as L Bn , where 
the second subscript indicates the number of variates: 

(id l .„ « |,s ;i |! - 

So 


3. The moments of the distribution of L, n when the population is spherical. 

The distribution of L an cannot be easily obtained in explicit form for a general n, 
but the moments of L Hn when the hypothesis tested is true are easily found. 

Note first that L an may be resolved into two factors which are, when the 
population is spherical, statistically independent : 


( 12 ) 


(Si 8* S 3 Sa • • • «»)* 


So* 


\r»\ 


The first factor is just the one appropriate for testing the equality of the w 
variances when the orientation of the coordinate axes is fixed in advance, while 
the second factor is the square' root of the determinant of correlation coefficients. 
The moments of the distributions of these two statistics are known [3], and 
since the two are independent (for zero correlation in the population), we may 
write: 


(13) 


Mh(L.„) = M h (A)M k {B), 


where A and B are used to indicate the two factors, and M h indicates the Ath 
moment. The moments are given by 


(14) 


M*(L.„) 


tt r \(N — i + h) ] u h TMN — _1)) 

<-i WN - i) J lW(N - l + h)) • 


4. Significance test for n = 2. For n — 1, M h {L, i) = 1 for any h, as it 
should, since L.i is then identically 1, and the concept of sphericity Is meaning¬ 
less. For n — 2, the expression (14) reduces to, 


(15) 


%/f (t \ — iW -JL+ *)r(tf - i) . n - 2 

kK ‘**' t(N - 1 + h)V(N — 2) N- 2 + h 



TEST FOB SPHERICITY 


and the distribution is thus found to be 

(16) D(L a ) = (A — 2)1 "f* dL a . 

Thus for n = 2, the significance of the value of />,j obtained from a given sample 
of A points in a plane is simply 

(17) P(L ,2 < LU) = LT\ 

These results for n = 2 were obtained by another method in [1]. 


5* Significance test for n = 3. For n = 3 and higher values of n, no simple 
expression for the distribution seems obtainable. In this case it appears reason** 
able to fit a Pearson curve of the type, 

(18) y = Kx’-'O - xy~\ 


by adjusting p and q so as to obtain agreement, with the first two moments of 
the actual distribution. The calculations were carried out for Lit rather than 
L .i itself, to simplify the moment expressions. The first moment of LU is the 
second moment of L , 3 , and is given as a function of A by the equation, 


(19) 


(v s _ (3A - 6)(3A - 9) 
M,( ' V) (3A - 2)(3A - I)’ 


Recurrence relations, similar to those noted by Lengyel [4] in carrying out a 
similar task, hold for the moments of L'ts ; hence, 


( 20 ) 


w(A) = »i(NMN + 2). 


Explicit solution of the equations for p and q in terms of N is possible: 


( 21 ) 

( 22 ) 


(9A T + 5) (A 7 — 2) (N - 3) 
2(9JV= - 8A' - 15) 

2(9A r - 13) (9A + 5) 

9 (9A 72 - 8A -15) ' 


For values of N > 30, acceptable approximations to p and q are obtained by 
carrying out the division indicated in (21) and (22): 


(23) 

(24) 


p = |(JV _ 4) + 2/9 + 70/81 (A + 1) • • •, 

= 2 4- 14<) 

Q + 9(3A - 2) 2 


The values of p and q are given in Table I so that those desiring other than 
the standard significance levels may readily enter the Pearson tables. 

For A a multiple of 4 from 8 to 48, and a multiple of 10 from 50 to 100, the 
significance levels were taken from the Incomplete Beta-Function Tables, using 
adequate interpolation. The final Table I was then prepared by filling in the 
skeleton table by interpolation with respect to A. 

From the results of Wilks [5] it follows that — 2A log* L, n is, for large A, 




208 


JOHN W. MAUCHLY 


TABLE I 

5%, 1%, and 0.1 % levels of significance for the 8-dimensional sphericity criterion, 
L \3 = \ VN , and the values of p and q for the Pearson Type I curves used in 

calculating these levels 


N 

5% 

1% 

0.1% 

V 

9 

8 

0.172 

0.083 

0.030 

2.3239 

2.0312 

10 

.278 

.165 

.080 

3.3044 

2.0194 

12 

.366 

.243 

.139 

4.2911 

2.0131 

14 

.436 

.312 

.197 

5.2816 

2.0095 

16 

.494 

.372 

.252 

6.2744 

2.0072 

18 

.541 

.423 

.301 

7.2688 

2.0057 

20 

.580 

.466 

.346 

8.2642 

2.0046 

22 

.614 

.504 

.386 

9.2605 

2.0038 

24 

.642 

.538 

.422 

10.2574 

2.0032 

26 

.667 

.567 

.454 

11.2548 

-2.0027 

28 

.689 

.593 

.483 

12.2526 

2.0023 

30 

.708 

.616 

.510 

13.2506 

2.0020 

32 

.724 

.637 

.534 

14.2488 

2.0018 

34 

.739 

.655 

.555 

15.2473 

2.0016 

36 

.753 

.672 

.575 

16.2458 

2.0014 

38 

.765 

.687 

.594 

17.2447 

2.0012 

40 

.776 

.701 

.610 

18.2435 

2.0011 

42 

.786 

.714 

.626 

19.2425 

2.0010 

44 

.795 

.726 

.640 

20.2416 

2.0009 

46 

.804 

.736 

.653 

21.2408 

2.0008 

48 

.811 

.746 

.665 

22.2400 

2.0008 

50 

.819 

.756 

677 

23.2394 

2.0007 

55 

.834 

.776 

.703 

* 

* 

60 

.848 

.793 

.725 

28.2365 

2.0005 

65 

.859 

.808 

.744 

• 

* 

70 

.869 

.821 

.760 

33.2345 

2.0004 

75 

.877 

.832 

.775 

* 

* 

80 

.885 

.842 

.788 

38.2328 

2.0003 

85 

.891 

.851 

.799 

* 

* 

90 

.897 

.859 

.809 

43.2317 

2.0002 

95 

.902 

.866 

.819 

* 

* 

100 

.907 

.872 

.827 

48.2308 

2.0002 


•No values for p and q were calculated for these values of N; the levels were obtained 
by interpolation (see text). 


distributed approximately like x* with n(n — l)/2 degrees of freedom. How¬ 
ever, equation (24) above suggests that for large N one may get a very good 



CONFIDENCE INTERVALS 


209 

approximation (for n = 3) by setting q * 2; the significance test for n ,« 3 
then becomes, 

(25) P(L» < La) = hLTMN - 2) - (AT - 4)L.'!]. 

Probably similar approximations can be found for other values of ». It is a 
pleasure to acknowledge the helpful comments and advice which I received 
from Mr. A. M. Mood of Princeton. Recognition is also due Mr. Wallace 
Brcy, a student assistant under the National Youth Administration, who aided 
in the computations. 


REFERENCES 

111 J. W. Mauchly, Terr. Magn., Vol. 45 (1940) (In press). 

(2) Neyman and Pearson, Trans. Roy. So «., A, 231 (1933), p. 295. 
|3l S. S. Wilks, Biometrika, Vol. 24 (1932), p. 471. 

[4] B. A. Lkngyel, Annals of Math. Slot., Vol. 10 (1939), p. 305. 
16) S. S. Wilks, Annals of Math. Stat., Vol. 9 (1938), p. 00. 

Ursinos College, 

College ville, Pa. 


A SIMPLE SAMPLING EXPERIMENT ON CONFIDENCE INTERVALS 

By S. Kullback and A. Frankel 

1. Introduction. In order to illustrate some of the notions of the theory of 
confidence or fiducial limits in connection with a course in Statistical Inference 
at the George Washington University, we had the class cany out certain simple 
experiments, following a suggestion in one of Neyman’s papers on Statistical 
Estimation [1]. In the belief that the experimental data may be of interest 
to others, we present the results herein. 

2. The problem. We consider the problem of estimating the range 0 of a 
rectangular population defined by p(x, 0) dx = dx/0, 0 5x^8 and in par¬ 
ticular, for simplicity, we limit ourselves to samples of two and four. We 
consider three possible approaches to the problem, viz., by using (a) the sample 
range (b) the sample average or total (c) the larger (largest) sample value. 
Let us consider each in turn. 

(a) Sample range. Wilks [2] has shown that for samples of n and confidence 
coefficient 1 — a, the confidence or fiducial limits for the population range 0 
are given by r and r/4> a , where r is the sample range and is determined by 

(1) - (n - 1)^«] * a. 

For n = 2, a = 0.19 and n = 4, a = 0.1792, (1) yields «* 0.1 and $ a = 0.4 
respectively. Accordingly, for samples of two with confidence coefficient 



210 


S. KULLBACK AND A. FRANKEL 


1 — « — 0.81, and for samples of four with confidence coefficient 1 — a = 
0.8208, the confidence interval is respectively given by 

(2) (r, lOr) and (r, 2.5r). 

The length, X r , of the confidence interval is respectively 9r and 1.5r. Using 
the distribution of r, n(n — 1)(0 — r)r H ~ i /8 n , we have for samples of two: 
E(\ r ) = 3 0, <T\ r = 2.12130, and for samples of four: E(\ r ) — 0.90, <r\ r * 0.30. 

(b) Sample total. Following Neyman [1, p. 357] let us denote by A(8) the 
region defined by 

(3) 0 — A<Xi-f-xj5;0-|-A 

where 0 is the population range, Xi and x s the sample values of the sample E t 
and A is selected so as to have P\Et t A(9) | 0} =* 1 — a. It is readily found 
that P\Ei (A(6) | 0) = [0 2 - (0 — A) 2 ]/0 2 = 1 — a from which we find that 
A = 0(1 — a 12 ). Accordingly (3) becomes 0a 1/2 < x t + x» < 0(2 — a 11 *), 
yielding the confidence limits (xi + x 2 )/(2 — a 12 ), (x\ + :r 2 )/a 1/2 . For the 
confidence coefficient 1 — a = 0.81 the confidence interval is given by 

(4) [0.6394(#1 + z 2 ), 2.2941 (Jx + *)]. 

The length of the confidence interval is given by \ T = 1.6647(a?i + x 2 ) so that 
E(\ r ) ~ 1.65470, <r\ r - 0.67550. 

Let us denote by A'(6) the region defined by 

(5) 20 — A jC X] -f" 2*2 "4" Xs 4" x* ^ 20 -f- A, 

where 0 is the population range, X\ , j 2 , X* , x 4 the sample values of the sample 
E\ and A is selected so as to have P\Ea e A'(6) | 0) = 1 -r a. Using the known 
distribution of the sample average [3] and 1 — a = 0.8208, it is readily found 
that 

- £{£ - *(£)’ + »(£)} = 0.8208 

from which we find that A = 0.7880. Accordingly, (5) becomes 1.2120 < 
xi + xt + *3 + x 4 < 2.7880, yielding the confidence interval 

(6) [0.3587(2-! + x t + x 3 + x t ), 0.8251(xi + x 2 + x* + x 4 )]. 

The length of the confidence interval is given by Xr = 0.4664(xi 4- x a + x» + x 4 ) 
so that E{\r) — 0.93280 and <t\ t = 0.26790. 

(c) Larger (largest) sample value. Again following Neyman [1, p. 359] let us 
denote by A i(0) the region defined by 

(7) qO < L < 0 

where 0 is the population range, L the larger of the two sample values Xi and x* 
andg,anumber between zero and unity, to be determined byP{J5j «Ai(0) | 0| = 
1 — a. It is readily found that P\Ei e Aj(6) | 0} = (0* — g 2 0*)/0* = 1 — a, 



CONFIDENCE INTERVALS 


211 


from which we find that q « a Wi . Accordingly, (7) becomes 6a' 1 < L< 8 
yielding the confidence limits L, L/a n . For the confidence coefficient 1 — a = 
0.81 the confidence interval is given by 

(8) (L, 2.29411). 


TABLE I 


No. of cases of 
coverage per 
set of 100 
samples 

Frequency 

Range 

Sum 

Larger (Largest) 


Samples 

Samples 

Samples 

Samples 

Samples 

Samples 


of two 

of four 

of two 

of four 

of two 

of four 

69 





i 


70 







71 





i 


72 







73 






i 

74 


i 



i 


75 







76 

4 


3 


4 

i 

77 

2 


6 

i 

2 


78 

3 


6 


3 

i 

79 

9 

2 

4 

2 

3 


80 

3 

1 

6 


4 


81 

2 

2 

1 


3 


82 

2 

1 

6 

1 

2 

5 

83 

3 

3 


1 

5 

3 

84 

3 

2 

i 3 

"l 

4 

1 

85 

3 



3 

2 


86 

2 

2 


2 

2 

1 

87 

1 

1 

2 

1 


1 

88 



1 

2 

1 

1 

89 

1 


1 

1 



90 







91 

1 







39 

15 

39 

15 

39 

15 

Average.... 

81.1 

82.1 

80.2 

84.2 

80.2 

82.1 


The length of the confidence interval is given by X L = 1.2941L so that using 
the distribution of L, nL *~ 1 dL, we have E(\ L ) = 0.86279 and = 0.30509. 
Incidentally, since L £ xi + x t we have 1.2941L < 1.6547(x l + x») so that 





212 


S. ROLLBACK AND A. FHaNKEL 


in every case, for samples of two, the confidence interval of procedure (c) is 
shorter than the confidence interval of procedure (b). 

For samples of four, we consider the region (7) where L is the largest of the 
sample values ii, x*, x» and x* of the sample E t . It is readily found that 
P{Et 1 Ai(®) | 9} = (® 4 — g 4 ® 4 )/® 4 — 1 — a, from which we find that q* = a. 
For a = 0.1792, q = 0.6506 so that (7) becomes 0.6506® < L < 6 yielding 
the confidence interval 

(9) (L, 1.5370L). 

The length of the confidence interval is given by \ L = 0.5370L so that E(\ L ) = 
0.4296® and <r XL - 0.0877®. 


TABLE II 



! 

Sample 

Range 

Sum 

Larger (Larg¬ 
est) 


size 

Theo¬ 

retical 

Ob¬ 

served 

Theo¬ 

retical 

Ob¬ 

served 

Theo¬ 

retical 

Ob¬ 

served 

Confidence Coefficient 

2 

.8100 

.8110 

.8100 

.802 

.8100 

.8020 


4 

.8208 

.8210 

.8208 

.842 

.8208 

.8210 

Average length of confi¬ 

2 

3.0000 

2.9660 

1.6547 

1.6441 

.8627 

.8556 

dence interval per set 
of 100 samples 

4 

.9000 

.8976 

.9328 

.9296 

.4296 

.4272 

Standard deviation of av¬ 

2 

.2121 

.2133 

.0676 

.0581 

.0305 

.0293 

erage length of confi¬ 
dence interval 

4 

.0300 

i 

.0335 

1 

.0268 

.0140 

.0088 

.0093 


3. The Experimental Data. We considered the rectangular population with 
® = 1 and obtained the sample values by using pairs of digits obtained from 
Tippett’s random sample tables [4]. Using these observed values the confi¬ 
dence intervals given by (2), (4), (6), (8) and (9) were computed and the number 
of cases in which the value ® * 1 was covered, noted. In all, 3900 samples 
of two were observed, subdivided into 39 sets of 100 each. The samples of 
four were obtained by combining pairs of samples of two and there were studied 
1500 samples of four, subdivided into 15 sets of 100 each. Table I gives the 
observed distribution of the number of cases of coverage per set of 100 samples 
of two and of four. The length of the confidence interval obtained by each of 
the three procedures was obtained and the observed mean and standard devia¬ 
tion of the distribution of the average length of the confidence interval per set 
of 100 samples computed. (Since they are averages of 100 values, these ob¬ 
servations are practically normally distributed.) Table II summarizes these 
results. 






COMPUTATION OP GAMMA FUNCTIONS 


213 


REFERENCES 

[1] J. Nbyman, “Outline of a Theory of Statistical Estimation Based on the Classical 

Theory of Probability,” Phil. Tram. Roy. Soc. Series A Vol. 236 (1037), pp. 
333-380. 

[2] S. S. Wilks, “Fiducial Distributions in Fiducial Inference,” Annalt of Math. 8UU. 

Vol. IX (1938), pp. 272-280, 

[3] P. Hall, "The distribution of means for samples of sise N drawn from a population 

in which the variate takes values between 0 and 1, all such values being equally 
probable,” Biometrika, Vol. 19 (1927), pp. 240-244. 

(41 L. H. C. Tippett, Tracis for computers No. IS, Camb. Univ. Press, 1927. 

The George Washington University, 

Washington, D. C. 


THE NUMERICAL COMPUTATION OF THE PRODUCT OF CONJUGATE 
IMAGINARY GAMMA FUNCTIONS 


By A. C. Cohen, Jr. 

The difference equation 

/j\ /*+1 _ * ! + Cl* + Ct 

/» ** + OsX + Ci 

was used by Professor Harry C. Carver [1] as the basis for graduating frequency 
distributions in a manner analogous to the use of the differential equation 


1 dy _ a — x 
y dx ~~ 6o + bix + 6jx* 


in the Pearson system of frequency curves. In order to determine a particular 
by Professor Carver’s method it was necessary to perform the complete gradua¬ 
tion from the lower limit of the range up to and including the required /«. 
When x is large and only isolated values of f x are required it seems desirable to 
have a method for computing /* directly, and the present note seeks to accom¬ 
plish this purpose. 

It is well known [2] that the difference equation 

/<yv /•+1 _ (* — «i)(* — «*)...(* — a») 

K > ?. (*-/».)(*- ft) ... (x-ft.) 

has the solution 


( 3 ) 


U 


. r(x — cti) • • • r(s — a,) 

• r(® - ft) ... r(x - p*)’ 


where w z is a periodic function of x (w, = w*+» =*•••= k) and r(x + 1) 
for x, a positive real number may be defined in the usual manner by the second 
Euler integral 


(4) 


r(* + i)-j£ (*«"*<« 



214 


A. 0. COIICN, St t. 


which obeys the recursion formula 

(5) r(x + 1) - xr(x). 

When x is a positive integer 

(6) r(x + 1) - xl 

Equation (1) is seen to be a special case of (2) for n = m = 2 and accordingly, 
the solution may be written as 

(tf\ , _ tr <*i)r(® oa) 

<7) 

where a t and a» are roots of x* + Cix + c* = 0 and ft and ft are roots of 
x* + c»x + ci = 0. The following simple examples illustrate three special 
cases of this solution. 

I. All a’s and 0’s are integers. 

Ux _ 2(x* + 9x + 20) 

/, x* + 5x + 6 

has the solution 

• y«y» r(x + 4)r(x + 5) 

J ‘ r(x + 2)r(x + 3) 

which, with the aid of recursion formula (5) can readily be verified by direct 
substitution. 

II. Either the a’s and/or the fts are real irrational numbers. , 

Ui ^ x* + 5x + 6 
fm X* + 3x + 1 

has the solution 

. _ K r(x + 2)r(x + 3) 

Ja " r[x + 4(3 - V6)]r[x + 4(3 + Vs)) 

which, with the aid of the recursion formula (5) can also be verified by direct 
substitution. 

III. Either the a’s and/or the 0’s are complex. 

fw+i x 1 + 8x + 17 
/. " x* + lOx + 29 

has the solution 

f zr r(x + 4 + t)r(x + 4 - 1 ) 

J ' r(x + s + 2t)r(x + 5 - 2»y 

Since the recursion formula (5) is also valid for complex arguments [3], this 
solution can be verified by direct substitution just as in the first two cases. 
The evaluation of /«for a given x in cases I and II involves only computation 



COMPUTATION OS' GAMMA FUNCTIONS 


215 


of quantities of the form r(x) which can be accomplished through the use of 
existing tables of Gamma Functions for small values of x and through applica¬ 
tion of Stirling’s formula for large values of x. Evaluation of /, in case III, 
however, involves the computation of quantities of the form r(u + tt*)r(u — tv), 
a problem which seems to have escaped previous attention. The remainder of 
the present discussion will center about this quantity. 

The Gamma Function for a real positive argument has been defined by 
equation (4), but for the present purposes, it is more expedient to use the 
definition 


( 8 ) 


r(z) 


Lim 


nln* 

z(z + 1) • • • (z + n) 


which is valid for all values of the complex argument z except at the poles 
(z — — 1; z — —2, etc.). The above definition is equivalent to (4) at all points 
where (4) is valid [3]. 

From equation (8), it immediately follows that r(u + iv)T(u — iv) is a real 
number. In fact, we have 

(n 

r(u + *V)r(u - iv ) = Lim . , , .... —r iT« i~¥ i - ft — i— w, m • 

«-• (u* + »*][(« + 1)* + »*]•• • [(u + n)* + »*] 


We now develop a formula applicable in evaluating this quantity when u is a 
sufficiently small positive integer. As a consequence of equation (8) it can be 
shown that [3] 


(9) r(*)r(i - z) 

sm x z 

% 

Let z * iv in the above equation and we immediately obtain the result 

(10) r(«r(-« - jgfL. 

When u is a positive integer, we may write 

(11) T(u + iv) — (u — 1 + iv)(u — 2 + tv) • • • (iv)r(tv), 

(12) T(« — iv) = («—!— iv)(u — 2 — iv) ••• (—t»)r(— iv). 


The product of (11) by (12) gives 

T(u + iv) r(ti - iv) m v a (v* + 1) •..(»* + u - l')rWr(-iti) 

which upon substitution of the value found in Equation (10) for r(w)T(— iv) 
becomes 


e** — er * 9 r~ i 


(18) 


r(« + w)t(m — iv) 




216 


A. C. COHBN, JB. 


To obtain a result that is applicable when u is not a positive integer, we 
make use of Stirling’s formula for complex arguments. Lipschits [4J proves 

Log r(«) * log V2ir + (z - h) log z 

^ -2+ (-lfy B *~ ¥l _L 

' i* (2m + l)(2m 4* 2) 

and that the remainder after the with term is 

_ (-ii , ... 

"+ 1 (2m + 3) (2m +T) «*■+« ( + } ’ 

where « < 1; «' < 1. B\ m +\ designates the Bernoulli numbers. {B\ = J; 
B t = Vsi etc.) We are thus able to write 

Log r(u + tv) = log r (Me 1 *) 

= log V2r + (Re if - i)(Iog R + t» 

00 / n -(2i»»+l)»> 


_ ffp'V 4. V 1—1; e _ 

r iS (2m + l)(2m + 2) fl** 1 ’ 


where <p = tan 1 - and R — y/u* + »*> 
u 

Log T(u — tv) = log r(Re~ ,,f ) 

/1fl . = log a/2t + (Be - *’' - i)dog R - t» 


00 / 1 \mr> (2m4-l)»* 

— Rj>~ { * j. V l—l; titm+i e _ 

m—o (2m -f- l)(2m 4- 2) fZ*" +1 

Adding (15) and (16), we obtain 

Log r(u 4- iv)r(u — iv) — log 2t 4- (c'” 4- e~'*)R log R — log R 

4- Rupie** - e~**) - fl(e‘ v 4- e - ") 

i y ( — l) m J3» M +i 

^ ^ (2m'+ l)(2m + 2) K ^ 6 1 R*+ l 

which upon being simplified becomes 

Log T(u 4* *t>)r(u — iv) 

(17) 

= log 2ir 4- (2m — 1) log R — 2{<f/v 4- u) 4- 2 +(R, <p), 


where 


*(R, <p) * Z 


(-irB»»+i i 


% (2m 4- l)(2m 4- 2) 


cos (2m 4-1)?. 


This result is somewhat similar to that obtained by Karl Pearson [5] in con¬ 
nection with the evaluation of the G^) integrals of his Type IV frequency 



COMPUTATION OF GAMMA FUNCTIONS 


m 


curve. If R > 1, the expansion of +(R, <p) is asymptotic and the greatest 
numerical value that the.mth term can have is 


_ Btm+l _ 1 

(2m + l)(2m + 2) R 2 ”* 1 ' 

Thus according to Lipschitz results, the error committed in dropping all terms 
after the mth will not exceed: The following 

table gives an indication of the size of the error: 


Terms omitted Error committed in 

after $(R f <p) less than 

1st ±.0833 3333/12 

2nd ±.0027 7777/# 

3rd ±.0007 9365/# 

4th ±.0005 9524/# 

5th ±.0008 4175/#. 

It is now obvious that formula (18) will give satisfactory results whenever R 
is sufficiently large. The degree of accuracy required together with the value 
of R will determine the number of terms of rp{R, <p) to be computed. 

We now turn to the solution of the example under Case III and proceed to 
calculate /«, fit, and fm when / 0 = 29. We may write 

K _ on r (5 + 2i)r(5 - 2t) 
f (4 + t)r(4 — i) ’ 

Application of formula (13) gives 

T(5 + 2t)r(5 - 2*) = 244.043 648, 


r(4 + t)r(4 - i) = 27.202 292, 
from which, K — 260.171 676, 

f _ o«n 171 R7A "h *)r (8 t) 

/. = 260.171 676^.^^—.^. 
Again making use of formula (13) we have 

260.171 676,5^^^, 5.6722, 


fu = 260.171 676 


T(19 + i)r(19 - i) 

r(20 + 2i)r(20 - 2 i) ' 


Since R is fairly large in this instance, formula (17) is used and all terms of 
ip(R, <fi) after the first are dropped. This result gives 


log T(19 + i)r(19 - i) - 31.5892 259, 
log T(20 + 2t)r(20 - 2 i) - 34.0812 782. 




218 


A. C. COHEN, JR. 


Accordingly, log /« = 9.9232 071 -10 

and fu = .8379. 

By the same method /i» is calculated and we find fm — .008723. 

As a check on the accuracy of the results obtained in the above computations, 
values of/, for x ranging from 1 to 15 were computed, using the given equation 
as a recursion formula. That is 

fi = ^/« = 17, /, * ~fi = 11.05, etc. 

These results are given in the following table, and it is to be noted that the 
values in the table for fi and agree with those previously computed by use 
of formulas contained in this paper. For obvious reasons, no attempt was 
made to compute the value of fm by this method. 


TABLE I 


X 

/. 

X 

f. 

X 

/(*) 

0 

29.0000 

5 

4.3375 

10 

1.6228 

1 

17.0000 

(> 

3.4200 

11 

1.3961 

2 

11.0500 

7 

2.7633 

12 

1.2135 

3 

7.7142 

8 

2.2779 

13 

1.0644 

4 

5.6722 

9 

1.9092 

14 

0.9411 





15 

0.8379 


REFERENCES 

[1] H. C. Carver, “On the Graduation of Frequency Distributions,” Proc. of the Casually 

Actuarial Soc . of America , Vol. 6 (1919), pp. 52-72. 

[2] Guldberg and Wallenberg, Differemengleichungen, B. G. Teubner 1911 , p. 25. 

[3] Whittaker and Watson, Modern Analysis, Cambridge Univ. Press 2nd Ed. 1911, pp. 

229-258. 

[4] Lipschitz, “Ueber die Darstellung gewisser Functionen durch die Eulersche Suramen- 

formel,” CrelWs Journal , Bd. 56, s. 20. 

[5] “Preliminary Report on Computation of G(„) Integrals,” Reports of British Assn, for 

Ado. Science , 1896, p. 70. 


University of Michigan, 
Ann Arbor, Mich. 



PEARSONIAN APPROXIMATIONS 


m 


COMPARISON OF PEARSONIAN APPROXIMATIONS WITH EXACT 
SAMPLING DISTRIBUTIONS OF MEANS AND VARIANCES 
IN SAMPLES FROM POPULATIONS COMPOSED OF 
THE SUMS OF NORMAL POPULATIONS 

By G. A. Bakes 

1. Introduction. Biological and sociological data are often “non-homoge- 
neous” and of such a nature as not to be easily separated into components. 
Non-homogeneous populations have been discussed by Karl Pearson, Charlier, 
and others. Non-normal material has been discussed by many writers. See 
for example, A. E. R. Church [1] and J. M. LeRoux [2] for a discussion of 
moments of the distributions of the means and variances for samples from 
non-normal material. 

In a previous paper [3] the author has given the distributions of the means 
and standard deviations of samples from certain non-homogeneous populations. 
The purpose of the present paper is to extend the results given in [3] and to 
compare the moment approach of the Pearsonian school with the true distri¬ 
butions. 


2. Moments of the distribution of means of samples of n from a non-homo¬ 
geneous population. Consider a population with distribution 


( 2 . 1 ) 


fix) - 

(1 + As) "V^ 2 flr L 


The first four moments of (2.1) about x = 0 are 

km 


( 2 . 2 ) 


/ 

Vi = 


/ 

Vi - 


I 

Vi - 


I 

V* = 


1 + k 
1 

1 + k 

km 
1 + k 

1 

1 + k 


[1 + + m ‘)] 

[3<r* + m] 

[3 -f- k(3a* -j- 6mV -f- wt 4 )]. 


The means of samples of n drawn at random from (2.1) are distributed 
according to 


(2.3) y^d + k) 


n _lyM 

1 + k) n *~ 0 V 8 / 


k $ 


\/sa 2 + n — 8 


exp 


n i z - i m T 

to 1 + n — « 




220 


6. A. BAKER 


Denote by im! p the moments of (2.3) about x == 0 and by m p the moments about 
the mean. Then in view of the relations 

n! s' _ yJ . nl 

(n — 8)! s! £o n (n — s)! (s — r + i)l 

(2.4) A r (r- 1) = 1, An = 3, A« — 6, A« = 7, 

•4 m = 10, = 25, Am = 15, 

and similar relations, and reduction to moments about the mean we obtain 


(2.5) 


/ km i 
mi = r+ -£ = * 


rrii - ["3A(nA -f 1 )o- 4 + 3(n + A) + 6(n - 1)AV 

n*(l + A) 2 L 

+ {k + (» - l)}m 2 + JjL {(n - l)k + 11mV 


+ 


m 6 = 


{A: 2 + (3n - 4) A: + 1 
|^15{(2n — 1)A + l}wiff 4 — 15 {A + (2n — l))m 


(1 + ky 

A_ 

n 4 ('f+ A:)* 

+ 30 (n — 1)(1 — k)ma 


+ {- (n - 1)A 2 + 4(n - 1)A + ljmV 

+ ~- k {- A 2 - 4(n - 1)A + (n - 1 )}to* 

+ (1 ^ fc)2 { — A 8 + (— 10n + ll)A 2 + (10n-ll)A + l}m s 

The expressions for the first five moments agree with the results given by 
Church and Tchebycheff. 

The betas of (2.4) are 


iB, 


A 2 to 2 
n(l + A) 


* - 3 +m 
j»- +1 + 



8 


( 2 . 6 ) 





PEAR80NIAN APPROXIMATIONS 


931 


(2.7) 1 B 1 - 3 


- 3 * -k 


3 + 3<7 4 — 6«r* 


6 > , 6 

f+V” + l + fc™ ° + (1 + *)* 


>m 4 ] 




+ 1 + 


1 + * 


m>J 


ifii vanishes if A; = 0, m — 0, or k = 1 and <r = 1. If A: and a are constant 
and m approaches infinity iB, approaches (1 — kf/nk. If k and m are constant 
and a approaches infinity iBj approaches zero. iB s — 3 vanishes if k = 0, 
k = oo, or if m — 0 and <7=1. If A: and <r are constant and m approaches 


TABLE I 


nit, and ,m t compared for four sets of values of k, <r’, and m 


Sets of values 

k <r* m 


ptn§ 

1/2 1/4 1.1 

4.599 1.228 

n* + n 4 

f-f) 

- (>+£) 

1/3 1 3.2 

89.702 39.322 


n* n 4 

- (- ?) 

-1/4 1/4 0.6 

4.640 1.744 


n 3 n A 

; 

”■ (>+f) 

1 4 5.6 

1,302.840 646.060 

„ oe 347.204 2.406\ 

, ( 762.035-— ) 

1 \ » n* / 

n* n 4 

(«♦?) 


infinity then iB, — 3 approaches (k 2 — 4k + l)/nk. If k and m are constant 
and <7 approaches infinity then iB* — 3 approaches 3/nk. 

It is of interest to compare the higher moments of (2.3) with the higher 
moments calculated from the first four moments on the assumption of a Pearson 
curve in place of (2.3). On this assumption 

. 2m, (rm + 7m,m 4 - 3m»m|) 

(2.8) ^ + 3m| * 

It is seen that (2.8) bears little resemblance to m,. If we consider the 
difference p m» — m, we see that it is of the same order in 1/n as is m* and the 










222 


G. A. BAKER 


numerator is of the 16th degree in k, m, and o; a very complicated locus, mi and 
pmi are compared for certain values of the parameters of ( 2 . 1 ) in Table I. 

Table I shows that the coefficients of 1 /n* in the expressions for m* and „mi 
differ by from two to more than 40 per cent. The coefficients of 1 /n 4 differ 
even more. The assumption of Karl Pearson’s curves to represent the distri¬ 
bution of means of samples of n from non-homogeneous populations seems to 
be adequate in some cases but inadequate in others even for moderate values of 
the parameters. 


3. Moments of the distribution of variances. In [3] an estimate of n times 
the standard deviation squared is expressed as 

(3.1) W = (w — s) o\ + sal + --~ (ffh + nb) 2 , 

n 

where a bar over a letter means an estimate of the corresponding population 
parameter and where (n — s) denotes the number drawn from the first com¬ 
ponent of ( 2 . 1 ) and s denotes the number from the second component. 

For the direct calculation of the moments of the distribution of variances 
it is easier not to use the distribution given in [3], but to proceed as follows. Put 

(n — s) ol = y, so 2 = x, — - — (ifii + m») 2 = z. 


Of course, for population ( 2 . 1 ) o\ — 1 , cr 2 = a, mi = 0 , m 2 = m. The variables, 
x, y, z are all independent in the probability sense and their probability distri¬ 
butions are well known. Hence the moments of 


(3.2) 


W _ x + y + z 
n n 


can be directly calculated. 

For instance, if p = 1 then 

(3.3) M[ - "-“- 1 1^5 [*«' + 1 + , | t m ’] • 

In general, of course, the moments about the mean check with the values given 
by Church. 

It is generally recommended to represent the distributions of variances of 
samples from non-normal parents by Pearson’s curves. Let us examine the 
results of this procedure in a special case. 

Suppose that the sampled population is 

(3.4) fix) = [e -1 ** + . 

The first eight moments of (3.4) which are needed in the calculation of the first 
four moments of the variances are: 




PEARSONIAN APPROXIMATIONS 


(3.5) 


Vi = 1.7000 v t - o 

Vt = 3.8900 v t = 294.47 

Dj s= 0 t>7 = 0 


04 = 28.692 08 = 3,818.4. 





Fia. 1 . Comparison of the True Distribution of the Variances of Samples of 4 
Drawn from the Non-Homogeneous Population (3.4) with the Corresponding 

Empirical Pearson Curve 


The first four moments of the variances of samples of 4 from (3.4) are: 

= 2.918 - 4.745 

(3.6) 

iM t = 3.396 j Mi = 41.52. 

Hence »Bi = .60 and »B* = 3.6, k = — .87 which calls for a type 1 curve. The 
equation of the curve is 


(3.7) 






224 


G. A. BAKER 


with its origin at its mode. The corresponding true distribution with the 
origin at the beginning of the range is 

yt = c'**[.3989V* + .003550sinh (3.4^3*) 

(3.8) 

+ .0005454 sinh (6.8\/*)]- 

Distribution (3.8) differs slightly from the corresponding result given in [3] 
because of an error in that paper. 

The two distributions are compared in Figure 1. It is seen that the two 
distributions are quite different. As the number of components of distributions 
similar to (3.8) increases, which is true as n increases, the distributions may 
be expected to become smoother and more closely representable by a single 
smooth curve. 

4. Summary. The moments of the distribution of the means of samples of n 
from a non-homogeneous population composed of two normal components are 
given up to and including the fifth. Thus fifth moment is compared with the 
fifth moment calculated on the assumption of Pearson’s curves to represent 
the distribution of means. The B’s of the distributions of the means are dis¬ 
cussed in certain limiting cases. It appears that for small samples and extreme 
values of the parameters, and in some cases of moderate values of the parame¬ 
ters, the Pearsonian approximations give poor results. 

Some identities involving the binomial coefficients are given which permit 
the reduction of the moments of the distribution of means calculated directly 
to forms given elsewhere [1]. A method is given for the direct calculation of 
the moments of the variances of samples from a non-homogeneous population 
composed of two normal components. An indication of the closeness with 
which a Pearson curve can be made to fit the distribution of variances in small 
samples from a non-homogeneous population is given in Figure 1. 

REFERENCES 

[1] A. E. R. Church, “On the means and squared standard-deviations of small samples 
from any population,” Biomelrika, Vol. 18 (1926), pp. 321-394. 

[21 J. M. LeRoux, “A study of the distribution of the variance in Binall'samples,” Bio¬ 
melrika, Vol. 23 (1931), pp. 134-190. 

[3] fl. A. Baker, “Random Sampling from non-homogeneous populations,” Melron, Vol. 8, 
no. 3, (1930). 


University of California, 
Davis, Calif. 



A LEAST SQUARES THEOREM 


A LEAST SQUARES ACCUMULATION THEOREM 

Bt W. E. Bleick 

The following simple least squares theorem does not seem to have been men¬ 
tioned in the literature, and has at least one practical application. 

If A*(x) and B*(x) are polynomials of the same degree which are least squares 
representations of the functions A(x) and B(x) respectively, for the values 
*1 > , • • • , Xp , then 

(1) £ A*(x t )B(x t ) - t, A(x t )B*(x t ) - £ A*(x t )B*(x t ). 


To prove the theorem let 


A*(x) = £ OiX* 

4-0 


(3) B*(x) = £ fc,V. 

i -o 

Then the normal equations for the determination of o< and 6/ are 


m p 

£ a< 8i+k = £ x k ,A(x,), 


k = 0, 1, 2, •.., m, 


£ 6,-s J+ * = £ zt 


h — 0,1, 2, • • • f tip 


where * r = £ x * • 

i-i 


Hence, by (2) and (5) 


i-i i-i L<-« J 

= £<*•£ a;!Bfe) 


<-0 t-1 


= £ £ OibfSi+t if » £ to, 

= £ if n ^ to. 


Similarly it can be shown that 


£ A(xt)B*(x t ) = £ A*(a;i)B*(a; ( ) if to ^ n. 



226 


W. E. BLBICK 


Combining (6) and (7) we have 

(8) £ A*(x,)B(x,) - £ A(x t )B*(x t ) - £ A*(x t )B*(x t ) if w - n. 

<-i (-i (-i 

In the particular case A{x) » B(x), equation (8) gives the interesting result 

(9) £ A*(x,)[A(x,) - A*(x,)] * 0. 

i-i 

An obvious extension of equation (6) is 

(10) £ xlA*(xt)B(x t ) m £ x q ,A*(x,)B*(xt), if n £ ro + 9, 

<-i <-i 


where q is a positive integer. 

A practical application of (8) has been made by one large insurance com¬ 
pany in the case m »= n - 1. Suppose that A (x) represents an annual payment 
made x years ago and is an approximately linear function, and that B(x) repre¬ 
sents a compound interest function. Then, even if B(x) is not a linear function, 
we may write approximately 

£ A(x)B(x) eg £ A(x)B*(x) 

3—1 3—1 

= £ A(x)(bo + bix) 

( 11 ) *" 1 

S ho £ A(x) + hi £ xA(x) 

r —1 *—1 


£ h# £ A(x) + hi £ £ A (if). 

a—1 * a—l y —x 

Thus if a year-by-year record is kept of the annual payments A(x), the sum 
£ A(x), and the double sum tt A(y), and if ho and hi are tabulated func- 


*-i 


tions of p, equation (11) affords a convenient method of evaluating £ A(x)B(x) 

J—1 

approximately. 

The author wishes to acknowledge that the case m = n ■ 1 of equation (8) 
and the above application were brought to his attention by John K. Dyer. 


Cooper Union, New York, N. Y. 



PARABOLIC TEST FOR LINKAGE 

Bt N. L. Johnson 

1. Introduction. In this paper a problem in testing statistical hypotheses 
which has applications in genetics will be treated from the standpoint of the 
Neyman-Pearson approach. This approach has been developed in a series of 
papers, [4], [5], [ 6 ], [7], [ 8 ], [9], [10], to which the reader is referred for definitions 
of the concepts of a simple statistical hypothesis, critical regions, power function 
of a test with respect to alternative hypotheses, and that of a test unbiased in 
the limit employed in the present paper. 

2. Statement of Problem. We shall consider M independent experiments, 
which will each yield results falling into one of the four categories described by 
the possible combinations of the 4 events o, not-a (or a), b, and not -6 (or h) 
as set up in the following table. 



u 

not-a 


b 

Pi 

p. 

P. 

not -6 

p 3 

V 4 

1 - P. 


P* 

1 - Pt 

1 


We shall assume that the marginal probabilities are known and have values 
Pi, 1 — Pi, Pj, 1 — Pi as shown in the table. Thus Pi = probability of 
event b happening whether event a occurs or not.. It is obvious that if, further, 
the probability of a result falling in any one cat egory or cell is fixed, then the 
other three cell probabilities will also be fixed. For if pi, Pt , ps, p« be the 
four cell probabilities as shown in the table above, we must have 

(1) Pi + pt - Pi ; pi + p* = P* ; ps + Vi = 1 - P* • 

Hence the values of the cell probabilities will be determined by a single parameter 
6, say, as follows 

pi = PiP*e* jh = Pi(l - P»e) 

pt = P*(l — Pie*) pi — 1 — Pi — Pt + PiP*e*. 

The range of values which 6 may take for the set of admissible hypotheses is 
found from the conditions 


227 




228 


N. L. JOHNSON 


(3) 0 < Pi < 1 (i m 1, 2, 3, 4) 

to be 

(4) — oo < e < min ( — log Pi, —log P») if Pi + Pj < 1 
but 

(5) log (PT 1 + P7 1 - Pr'PT 1 ) < 9 < min (-log P,, -log P,) if P x + P, > 1. 

The hypothesis tested, H 0 , is that 9 = 0, i.e. that the events a and b are 
independent. It will be noticed that Ho is a simple hypothesis, since it specifies 
the probability law of the observed variables completely. In fact, if be 
the number of results out of our M experiments which are in the ith category, 
then mi, to* , wi*, are our observed variables, and we have 


( 6 ) P(mi = Tn[, ntt — m'x, wij — m'i, m« = m[ | Ho) 


M\ ptiptiptip ti 

wijl mtl m'il mil 


where p 0 < is the value of p< when 9 = 0. 

This is the conceptual model used in testing for linkage in two pairs of genes; 
Ho corresponds to the hypothesis “there is no linkage.” Fuller explanations 
are given by Fisher [3]. It should be noted, however, that Fisher uses a pa¬ 
rameter 6 corresponding to in this paper. 


3. Basis of Selection of Test. The question now arises; what test shall we 
choose for the hypothesis H 0 ? That is, what should the critical region to be 
to give us results as satisfactory as possible? The main aim must be to avoid 
errors, both of first and second kind, as far as possible. The first kind of error 
is subject to control, since the probability of the sample point E falling in w 
when Ho is true (which we shall denote by P{E tw\ Ho)) can be determined 
approximately, H 0 being simple. The critical region to is therefore chosen, if 
possible, to give a definite level of significance to the test associated with it. 
However, there will usually be many regions which will do this, and in 
order to decide which of them give more satisfactory results we consider 
(1 — P{E tw \ H))\ i.e. the probability of the second kind of error with respect 
to an alternative hypothesis H, the first kind of error being fixed. 

In the present case H will be determined by 6 and so we may put 
P{E e w | H) = 0 (to | 9), where 0 (io | 9), considered as a function of 9, will be 
the power function of the test associated with the critical region to. We want 
w to be such that / 3 (io | 0 ) = a. a being the fixed level of significance while 
/J(u> | 9) is as large as possible. 

It is also desirable that we should accept the hypothesis H 0 more often when 
it is true than when any one of the alternative hypotheses (H) is true. Ex- 



TEST FOR LINKAGE 


pressed symbolically, this means that 

(7) j8(w | 0) < j3(to | 6) for all 6* 0. 

Any test satisfying the last condition is said to be unbiased. 

If /3 and ~ are each continuous and differentiable fimctions of 0, and we 
00 

consider only those alternative hypotheses specified by suitably small values 
of 0, sufficient conditions for the test to be unbiased will be 


0 , 


(8) 

d0l 

00 J#-o 

(9) 

a^i 

00*J*-o 


> 0 . 


According to the terminology recently adopted by Daly [1], the tests of 
which it is known only that they satisfy (8) and (9), are called locally unbiased. 
If a region w could be found such that, v being any other region for which 


( 10 ) 


0{w | 0) = 0(v | 0), then 0(w | 0) > p(v \ 0) 


for all 0 9 * 0, this would give a test which would be the best with respect to any 
alternative hypothesis. However, it has been shown by Neyman [4] that under 
certain conditions, which many probability laws satisfy, such a test will not 
exist. An attempt is therefore made to control the power of the test with 
respect to hypotheses specifying values of 0 near to 0; hoping that the powers 
of the tests so obtained with respect to the other hypotheses will behave in a 
satisfactory manner. Thus Neyman and Pearson [9] define an "unbiased test 
of Type A” as a test corresponding to a critical region w such that if v be any 
other region in the sample space W for which 


(ID 

P(w | 0) = 0(v | 0) = a 

and 


(12) 

a/3 (tv 1 0)1 _ dp(v 19)] _ 0 

d0 Ji-o 90 Ji-o 

then 


(13) 

d’/3(w 1 0)1 ^ d*P(v I 0)1 

00* J#-o “ da 2 J»-o' 


(14) 


In the problem which I am treating the conditions 

. »*(” I *) ~1 „ o 

’ 00 J»-o 


P(w 10) 


implied by (11) and (12) above cannot, in general, be satisfied, since the distribu¬ 
tion is discontinuous, i.e. P\E *w\ Ha] is a discontinuous function of w and, in 



230 


N. L. JOHNSON 


fact, for a given sample size, has only a finite number of possible values, none 
of which need be equal to a. 

However, it may be possible to find a test of Ho of a type called “unbiased 
in the limit (as M increases),” based on the limiting form of the multinomial 
distribution which is a continuous function of w . The definition [6] of a test 
“unbiased in the limit” will be taken as follows: 

Suppose we have a sequence (w M ) of critical regions, w M corresponding to a 
sample of size Af, such that 

(i) for any M, if v M he any region for which 


(15) 

P(w M 

10) = 

P(Vm | 0) 

and 




(16) 

dp(w„ | 0)' 

1 = 

3 P(vm 1 0)1 

30 

J*-o 

dd J*~o 

then 




(17) 

3 s /3 (w„ | 0)' 
30 2 

> 

3 2 /3 (vu | 0)1 

dd 2 Jtf-o 

(it) 




(18) 

lim 

Af —*oo 

P(w„ I 

0) = a, 


(in) if 

(19) # = VM(e - o) = Vmo 

(20) lim »%l *n = 0 

Af —»oo UV 0 


then the test associated with this sequence of critical regions is unbiased in the 
limit. I shall call such a test a test of type A *. 

The reason for using d as the variable in condition (19) above is that, unless 
our sequence of critical regions has been very badly or unluckily chosen, we 
shall have 


lim p(w M 1 6) = 1 

A# —*oo 


3/3 (Wjf | 0) 


(0 5* 0 ) 


while, by (18), lim ft(w M | 0) = a and so, in general, lim '' will not 

J#-+oo Af —*oo OU 

exist at 0 = 0. Hence we introduce ft, termed the normalized error , and, keeping 
d constant (and hence making 6 tend to zero) we form lim —~ . 

A#-* oo uV 

In the next section will be obtained a test of Ho which is of type A K . 


4. Derivation of Test. The composition of a sample of M experiments is 
uniquely determined by the numbers of results mi, mi, mt falling in the 1st, 



TEST FOR LINKAGE 


231 


2nd and 3rd categories respectively. Thus any sample may be represented by a 
point E(m) in a three-dimensional sample space W{m) with coordinate axes of 
mi, m*, and . It will occasionally be convenient to represent the sample 
by a point in a three-dimensional space with other axes. The following sample 
spaces will be used. 



W(tn)~ 

-space with coordinate axes of mi, 

» , 

ms 



W(d)- 

“ “ di, 

d% f 

d% 



W{x )— 

n u u u u „ 

XI , 


x% 



W(n )— 

“ “ “ “ “ tlx , 

tli , 

n s 


where 






( 22 ) 


di = rrii — Mpoi 


«- 

1, 2, 3, 4) 

(23) 


Xi = (m< - MpodKMjhi)* 


(* = 

1, 2, 3, 4) 

(24) 


Hi = Vli/M 


(i = 

1 , 2, 3, 4). 


1 shall use w u indifferently to denote “the critical region corresponding to 
sample size M" in any of the four sample spaces above; E indifferently to 
denote corresponding positions of the sample point in any of the four sample 
spaces: except in cases where confusion might arise, where I shall use t o u (m), 
w M (d), w u (x), u> M (n) and E(m), E(d), E(x), E(n). When necessary the size of 
sample with which a point E is associated will be denoted by a subscript; e.g. E M • 
In finding a test of type A K we shall need to consider the quantities 

«».|°),- >VM. 

The probability law of the observed values mi, m*, m» is discontinuous with 
respect to the points of the sample space W m . For if E° be a point which 
corresponds to integral values m?, mj, ml of mi, tnj, m*; subject to the re¬ 
strictions 

(25) 0 < m? (i = 1, 2, 3) 

(26) 0 < £ m? < M 

i-l 


then 

(27) 


P{E m ■ JS °|0 - 0 ) 


MLpgggagpg 

mi! mj! m°! mj! 


where 

£m! 

4-1 


(28) 




232 


N. L. JOHNSON 


Pi) 


and 

Pn * PiPt Pm * Pi(l — Pi) 

(29) 

Pat = P,( 1 - Pi) Po4 = (1 - Pi)(l 
while if P 0 be not such a point 

(30) P\E m rn E* I e\ =0 
whatever the value of 0 may be. Now 

(si) 

v j/ ftX\ I ffl% I I WI 4 1 

where Pi, Pt, p», p* are as defined in (2) above, and £ denotes a finite sum- 

IP Ji 

mation over all points E' in w u for which P\E M m E' \ 0 } ^ 0. Differentiating 
each side of (31) with respect to 8, we get 


(32) 


dfijwtt 1 0) 

8<T 


1 

J*-o 


Ml pXp m'Pm'pZ 

mil mil mj! m<! 


rmi(l - Pi - Pt) - »itP* - wijPi + m«PiP»1 
L ' (i - P,)(T- p t ) J 


and 


8jg(tg«j 8) 
88* 


1 - 


mi! to*! w»! mil 


( 33 ) _ p t y I{”*i(l — Pi — Pi) — mtPi ~ m*Pi + MPiPt } 2 

- {n»iPiPj(l - Pi- P,) + m»P*(l -Pi- PiPt) 


+ m,Pi(l - Pt - PiPt) - MPiP,(l - Pi)(l - P,)}]. 
Theorem 1. The sequence of critical regions (w M ) defined by 
(34) v + Bu 2 > A in m« ; v + Bu < A elsewhere, 


where 

(35) ti 

(36) v 

(37) B 


*i(PiPt)‘(l - Pi - Pt) - XtPi(l - Pt)‘Pt - x,Pj(l - Pi)* Pi 
{P,Pt(l - Pi)(l - Pt)}‘ 

Pi( 1 - Pi)(2P, - l){x,(P,P,)* + x,P|(l - Pi)*) 

+ P,(l - P,)(2P, - l)(x,(P,P,)‘ + a^P*(l - P,)* 
[P,P,(1 - P,)(l - P,){P,(1- P,)(l - 2P,)* + P,(l - P,)(l - 2P,)*})» 

r MPiP,(l - P,)(l - Pt) 1 * 

LPi(l - Pi)(l “ 2P,)* + P,(l - P,)(l - 2P,)*J 




TEST TOR LINKAGE 


388 


<*> 

m, - Mp<n 


and Xi 


as defined above, is associated with a test of the hypothesis 


(Mp *) 1 

H 0 {8 = 0) which is unbiased in the limit, of type A* at level of significance a, 
provided that 


(39) 


0 < Pi < 1 


and Pi and Pt are not both equal to §. 

In Lemma 1 of the Appendix (paragraph 9), put s — 2, and let 

fi = individual members of the summation for 0 (to* 10 ) 


/* = 


ft = 


« bfiiwii 1 9) 


L 

* J »-0 


(t - 1, 2) 

(see (31)) 
(see (32)) 

(see (33)). 


(40) 


From Lemma 1 we see t hat the regions (to) defined by 
ft > a ifi + oj/j in to 
ft < aji + Otft elsewhere 
will maximize £ /o with respect to all regions for which £/x and £/* are fixed. 

u* ww 

(ai and a 2 are arbitrary constants depending on the fixed values of 53 /i and 

ip 

£/*). Hence any sequence of critical regions (to*) defined by 


(41) 


|w»i(l - Pt - Pt) - mtP t - mPi + AfPiPj) 2 

- {tniPiPj(l - Pi- P t ) + mtPtil - Pi- PiPt) 

+ m,P,(l - Pi - PiPt) - MPiPt( 1 - P,)(l - P.)} 

> ai|mj(l — Pi — Pi) — m»Pj — wjPi + MP\Pt) + a* 

in to *, will satisfy conditions (t) given above in the definition of a test of 
type A m . The inequality (41) may be rewritten 

jwii(l — Pi — P») — m,Pt — mtPi + MP\Pi — Oj) 2 

(42) - [P,(l -Pi)\mt- MPi( 1 -Pi)} 

4 +Pi(l-Pi){m i -MPt(l-Pi)}}>a i 


the a<’s being arbitrary constants. 

Also, by Theorem 1 of the Appendix, we have that, for any given * > 0 
and any region to, there is a number M, independent of to and such that for all 
M > M., 



N. h. JOHNSON 


234 


(43) | fi(w | 0) - I(w) | < « 
where 

(44) IM = mb?* III ‘^to**** 

«(*) 

and 

3 

(45) Xo = 2 z<(l + PoiPot) + 2 £ XiXj(poi po/)* Pot- 

*-l i<f£ 8 


We will now apply a transformation to the coordinates m t , m *, vh which will 

(а) transform inequality (42) into a simpler form, 

(б) transform I(w ) into a form to which the tables of the Normal Probability 
Integral may easily be applied for purposes of calculation. 

This transformation is 


(46) u = 


_ *t(PiPj*(l - Pl - P.) - XjP*(l - P 2 )»P 2 - x»P|( 1 - P,)*P. 


(47) v = 


(48) t = 


{P,P,(1 -P,)(l -P 2 )}» 

Pi( 1 ~ Pi)(2P t - l){xt(P 1 P t ) i + *,P}(1 - P,) 1 } 

+ P,( 1 - P*)(2Pi - l)(xi(P \PtY + **P}(1 - Pa)*} 
[PiP 5 (l -Pi) (l -P*)(Pi(l -P,)(l - 2Pj) 2 + P*(l - P*)(l -2 Pi)*}1‘ 

(2Pi - 1 ){*i(PiP s )* + z s P}( 1 - Pi) 4 } 

- (2P* - Dix^PiP,) 1 + x,P}( 1 - P,)»} 


{Pi(l - Pi)(1 - 2P,) 2 + P*(l - P,)(l - 2P,) S }‘ 


This is a proper transformation, since under the conditions of the theorem 
0 < Pi < 1 and Pi and P 2 are not both and the Jacobian 


(49) 


, _ d(u, v, t) _ t 

d(Xi,Xi,Xs) 04 


is non-zero and of constant sign. 
Also 


(50) 


xo = u 2 


+ V 2 + i 2 . 


Hence 

(51) I(w) = --,/// e ~ 1(u,+ ’ ,+,,) dudv di. 

1 w(«.v.O 

The inequality (42) is transformed into an inequality of form B(u — a t ) 2 4- v > A 
where B has the value stated above; a t and A being at present arbitrary 
constants. 

Therefore we may put a 6 = 0 and define A by the equation 





TEST FOR LINKAGE 


235 


and conclude that the sequence of critical regions (vi u ) defined by the in¬ 
equalities 

Bu* + v > A in w M 
(53) 

Bu + v < A elsewhere 

will satisfy conditions (?') for a test of type A x . 

From (51) and (52) 


IM ■ jy, flf 


(64) 


<2t)I 

-iCH 


c -J(«*+.»+<*) du dv M 




!"*•* dv\ du = a. 


By Theorem 1 of the appendix, as mentioned above, we have 


(55) 

1 0(« ’* 

10) - 

/(»*) 1 

< 6 for all M > M t 

i.e. 





(50) 

1 d(w> 

«|0) 

— a | < 

€ for all Jlf > Af, 

and so 





(57) 


0(tc* 

!())-♦« 

as M —> oo. 


Thus the sequence of critical regions {w M ) satisfies the condition (it) of the 
definition of a test of type A * . 

If w be any region defined by inequalities on u and v only (as are the regions 
w M ) then, as a special case of Theorem 1 of the Appendix, we have that for 
any € > 0 there exists a number M ( such that for all M > M t 


( 68 ) 


P„(w) - ~ jj e -J(u,+ * ,) du dv <t 


v>(u,v) 


where P M (w) = P\E U e w | 0}. 

By (31) and (32), noting that = \/M • , we have 


(59) 


di S(w | tf)"] 


be m 

I = Z Mu, v).u. (PiP,)*(i - p,)" 1 (i - P,r 

Jtf—0 ye 

= Z/x(«, »)•«* 


where k = (P»ft)*(l “ Pi)“*(l - P*)"‘ > 0. 

By Theoreu 1 of the Appendix, as last stated above, we have 


(60) 


Mu, v) = + R M ) 



236 


N. L. JOHNSON 


where for convenience we have written Aw, Av for A(m>w, A ( jov the units of w 
and v when sample size is M, and Rm for Ry(u, t>) which has the property that 

(61) £ R m (u, v)A (li) u.A iit) v.e-* ut+v,) -* 0 

W 

uniformly with respect to w as Af —* «. 

Now let w + denote that part of w where Rm > 0 and w~ that part of w where 
Rm < 0. Then 

(62) £ kuMu, v) = £fc.^. w^’^ + £ k A “- Av uRue-^. 

io+ io+ lie %o + lie 

Let 


(63) 


= fc g {(«* A«A«y 


By Schwarz’s inequality 

Si v AuAt> j 


(64) 

But 


—i(w a +t>*) 


o auav 2 r» - 

t if ir “ **' 

AwAt> 2 -4<u*+v*) 


AuAv 

2ie 

AuAv 


R ue - i < U *+"*) 


u l R M e~^ vi) . 


( 66 ) g «**<«,.) - g ^ + 5 1 

Now u 2 fi(u, v) > 0 and 2^ u 2 fi(u, v) is finite (since w 2 is a homogeneous function 
~ w 

of second degree in the Xi» and so has a finite expectation) and is bounded 

as M —► oo. Hence Y) w 2 /i(u, w) is finite and bounded as M —► <*>. Further, 
10 + 

as M —» no 


( 66 ) 


AuAv 

2v 2 t 


s//“' 


e^'^dudv. 


Hence £ u s /2j#e _i<, ‘ ,+T,) is bounded as M —* ». From this result, 

»+ 2jt 

together with (61) and (64) it follows that Si —* 0 as M —♦ oo uniformly with 
respect to w. Putting 


(67) 


S~ M = Zk^uRue-'^ 

io- 2 t 


it will follow in a similar manner that Si 
respect to u>. Hence 


0 as M —* oo uniformly with 


( 68 ) 




+S> 


where Sm = 8m + Si and bo Sm— *0*b M —» so uniformly with respect to v>. 



TEST FOR LINKAGE 


287 


Hence whatever be < > 0, there is a number M', such that for all Af > M', 


d(t(w | 
M 


*)1 - 1 ft 
.M 2t J J 


e~* ( “‘ + *’ ) dudv < t 


whatever be the region w. In particular we may take w m w*, and then 
we have 

(70) A // * * = A £- {«-• *}*,=.« 


and so 


dfi(w„ 11 ?)' 

I M 


for all M > M'. 


lim - 0. 

l/—*«o 01 ? J *-0 


Hence the sequence of critical regions (w*) satisfies condition (iii) for a test 
of type A * . This completes the proof of Theorem 1. 

In the above theorem we have found a test which is unbiased in the limit for 
all cases except that for which P\ — P 2 = J. The following theorem derives 
the test appropriate to this special case, and it is found that in this instance the 
test takes a very simple form. 

Theorem 2. If Pi = P 2 = £, the sequence of critical regions (w M ) defined by 


\ + £3 | > « 

*2 + x % | < a 


%n w M 
elsewhere 


72* L ^ dx ~ l ~ a 


_ _mi- i M 
Xi ~ 


(* * 2, 3), 


is associated with a test of the hypothesis Ho(8 = 0) of type A m at level of 
significance a. 

The proof of this theorem follows the same lines as that of Theorem 1 as far 
as inequality (42). On putting Pi =■= P» = 1 in (42) we get 

(76) (— $mt — §*n* + — a ») 2 — i(m% + — JJf) > 04 


(x» + x* — <*«)* > • 



238 


N. L. JOHNSON 


The critical region w M defined in the statement of the theorem is of this 
form with a# = 0 and Or = a 1 . 

Hence the sequence of critical regions (w M ) satisfies conditions (i) of the 
definition of a test of type A K . The sequence of critical regions may also be 
shown to satisfy conditions (it) and (in) for a test of type A m by following the 
lines of the proof of Theorem 1 and noting that x% + x a = 2 M~*(mt + m t — JAf) 
tends to be distributed as a unit normal deviate as M —*• « 

On account of the shape of the critical regions in the general case, I shall for 
the remainder of this paper call the tests derived in the above theorem the 
parabolic tests for the cases considered. 


6. Application of the Parabolic Tests. For practical purposes the formulae 
derived above are inconvenient to use. I will therefore express them in terms 
of the deviations of the observed frequencies in the four cells from the frequen¬ 
cies “expected” when the hypothesis Ih(8 = 0) is true, i.e. in terms of the 
variables di , where 

(78) di = mi - Mpti = Xi(Mp*tf (i = 1, 2, 3, 4). 

The test then becomes “reject the hypothesis at level of significance a if 
v + Bu 2 > A" where 


di (l — Pi — Pt) — d jP% — d»Pi 
~ [Mp\Pt(i - Pi)d - m r 


Pi(l r Pi)(2P, - l)(d, +_*) + P,( 1 - Pj)(2Pi - l)(d, + dt) 
IMPiPtif- P,)( 1 - P s ) {P,(l-Pi)(2 Pj-l) 2 + P*( 1 — Pt) (2Pi — l) 2 }]* 


(81) 

(82) 


1 He- r e-»'dv\du = a 
X- 00 ^ Ja-Bu 2 J 

r mpjjo . - P iKi - p 2 ) 

LPid-POd - 2P 2 ) a +>,(! -P 2 )(l - 2Pi) 2 . 


except when P 2 = P* = In the latter case reject the hypothesis // 0 if 


(83) 


dt + dt ^ 

> a 


where 

(84) 

The application of this last case (Pi = Pa = I) is straightforward, o may be 
found from the tables of the Normal Probability Integral, dt and dt may be 




TEST FOB LINKAGE 


239 


calculated from the data, aud we may then see whether the inequality (83) is 
satisfied, and so assess our judgment of the hypothesis //o. 

TABLE I 

Significance of Symbols 

A and B are connected by the following relation: 


Table la 
a * 0.05 

P.06 - A - 3.8414588 B 

Table lb 
a *■ 0.01 

p.oi ** A — 6.6348966 B 

B 

P.06 

B 

P.01 

0 

1.6449 

0 

2.3263 

1.00 

0.322 

1.00 

0.289 

1.25 

.256 

1.25 

.231 

1.50 

.212 

1.50 

.192 

1.75 

.181 

1.75 

.165 

2.00 

.158 

2.00 

.144 

2.25 

.141 

2.25 

.128 

2.50 

.127 

2.50 

.115 

2.75 

.116 

2.75 

.105 

3.00 

.106 

3.00 

.096 

3.25 

.098 

3.25 

.089 

3.50 

.091 

3.50 

.082 

3.75 

.084 

3.75 

.077 

4.00 

.079 

4.0G 

.072 

5 

.063 

5 

.058 

6 

.052 

6 

.048 

7 

.045 

7 

.041 

8 

.039 

8 

.036 

9 

.035 

9 

.032 

10 

.031 

10 

.029 

15 

.021 

15 

.020 

20 

.016 

20 

.014 

30 

.010 

30 

.009 

40 

.008 

40 

.007 

50 

.006 

50 

.006 


The general case is also straightforward, except for the determination of A 
from equation (81). To facilitate this I have constructed Tables la and lb. 
These tables correspond respectively to significance levels .05, .01, and from 




240 


N. li. JOHNSON 


them the value of A corresponding to a given value of B may be calculated. 
The quantity tabled, (p), is the difference between A and a multiple 1 (constant 
for a given level of significance and given with the table to which it applies) of 
B. To find A, therefore, B is calculated, multiplied by the appropriate con¬ 
stant, and added to the quantity in the table corresponding to B. For large 
values of B (40 and over) p is small, and A may be taken equal to the constant 
multiple of B. 

In particular cases when the values of Pi and P% are substituted in the expres¬ 
sion for B (see Theorem 1 above) and in (79) and (80) above, these equations 
appear much less formidable. Thus in the case considered by R. A. Fisher 
[3], Pi = Pi = 1 and we get 

B = 

(85) 

u = WHMi - d, - d,); v = - 4(63f) -, (2di + d, + d,) 
and the test becomes “reject the hypothesis Ho at level of significance a when 

(86) * = {(2d t - d t - d,) 2 - f(2d, + d, + d»))/{|(|3f)*| > A 
where 

<87) s£{*~ £**,•■*■'*}*•-* 

Example. Fisher [3] gives an example of the case Pi = Pj = }. In the 
series of experiments that he quotes the observed results fall in the four cate¬ 
gories respectively as follows: 

mi — 32; to* = 904; rth = 906; jw« = 1997. M = 3839. 

Hence di = -207.9375; d* + d, = 370.375. From (86), <t> - 10863.1. B = 
37.94239. From the tables: 

at .05 level, Am = 3.8414588 X 37.94239 + 0.0075 = 145.7615 

at .01 level, A M = 6.6348966 X 37.94239 + 0.0065 = 251.750. 

Hence we reject the hypothesis that 9 = 0, i.c. that there is no linkage, since 
the value of 4> is well outside even the .01 level of significance. 



6 . Power function of the Tests. General Case. The parabolic test as de¬ 
scribed above has the desirable property that of all tests (at level of significnace 
a) which are unbiased for large values of M this test will detect small variations 
in 0 most frequently. However, to get a clearer idea of the properties of this 


1 This multiple is equal to fcl where 


1 /•+*• 
\/iir J-h « 




1 — a, a being the level of 


significance. 



TEST FOB LINKAGE 


241 

test we shall calculate, as accurately as may be practicable, the power function 
of the test. 

As a preliminary step we obtain a rough idea of the power function by making 
use of the concept of a limiting power function as stated by Neyman [6]. This 
may be defined as follows: 

Let E U ' denote the sample point corresponding to a sample of size M', and put 

(88) p { e m > «ti> i d'j - m 

where O' = M*6, w being a fixed region. Supposing d' kept fixed, let M' increase 
and let 

(89) l8.(u»| t>')* lim 0w(w | #') 

M '-MO 

if this limit exists. 

Then 0 m (w | d') is the limiting power function of the test associated with the critical 
region w. It will be noted that the limiting power function is a function of d'. 

In the problem under consideration the parabolic test when the sample size 
is M is associated with the critical region w M . Now it should be noted that 
in the definition of the limiting power function w remains fixed. Therefore 
the limiting power function of the parabolic test for sample size M is 

(90) |8*(u>jr 1 d') = lim dtt'iwu | d). 

M'—to 

The significance of the limiting power function is that for any t > 0 and for 
any d' there is a number M,,« such that for all M > M,,« we have in our case 
(by Theorem 1 of the Appendix) 

(91) | I d') - 0„(V)M | d') | < t. 

It should be noted, however, that the limiting power curve (the graph of the 
limiting power function against 6 = dM~ *) may be only a very rough approxi¬ 
mation to the actual power curve. Furthermore (Neyman, [6, p. 83]) we can¬ 
not, in general, use the limiting power function of a test to answer the question: 

“How large must we take our sample size M to detect the falsehood of the 
hypothesis H<>(0 = 0) when actually 6 = with a limiting probability of at 
least, say, 0.95?” 

For if we form a table as below 

M d[tt) * M*d' | d( it)) 

100 

1000 


it is possible that (t m (wu | d[ to) may never attain the value 0.95. 



242 


N. L. JOHNSON 


Theorem 3. The limiting power function of the parabolic test is 

( 92 ) ft.(«Ur | tf) - ^ jf*^ * e ~*'' dv ^ du 

in all cases for which 0 < P< < 1 and Pi and P 2 are not both equal to £. 

The proof of this theorem follows immediately from Theorem 1 of the Ap¬ 
pendix by applying the transformation (46)-(48) and putting X = PiP 2 . 

The above remarks concerning special precautions to be taken with respect 
to the limiting power function suggest the necessity of studying the actual 
power function of the parabolic test by some other method. 

With this object in view, a study was made of the distribution of the function 
<t> = v + Bu 2 for finite values of M and in particular for M = 100 and M = 3839. 
0 is a discontinuous variate and, for any given value of M, has definite limits 
of variation arising from the limitations on the values of the variables stated 
in the inequalities (25), (26) above. These limits of variation of 0 were found 
to be 

(93) - urn'm - xV) < 4 >< am'WGM - d 


for the case I\ — Pt = Hence when 


M = 100, -12.25 <<f> < 5486.80, 

M = 3839, -75.89 < <f> < 1310795.75. 


Also it was found that 
(94) &(*|0) = p{l 


2P s ) 

(i- mi - Pt) 


(c* — 1) + 


(M - l)PiPi 


(1 - Pi)d 




where 6(0 | 0) denotes the expected value of <t >, given the value of the parameter 
0. Thus when Pi = P 2 = i we have B — \/f M and so 6(0 | 0) = \/$M- 
Hence when 


M = 100, 6(0 | 0) = 6.12372, 


M = 3839, 6(0 | 0) = 37.94239. 


It is thus seen that the distribution of 0 might be represented by a Type III 
curve, since the distribution of 0 has a finite lower bound and a very long 
positive tail. In order to fit a Type III curve, we must know T the second moment 
of the curve as well as its lower bound and mean. The general expression for 
the second moment about zero is too complicated to be printed and so only the 
numerical expressions obtained by giving special values to M are given below. 
These are: 


(i) Ilf = 100 

S(<*> 2 1 0) = 112.41667 + 165.62963(e* - 1) + 2493.33333(6* - l) 2 
+ 1078.00000(e* - 1)* + 4356.91667 (e* - l) 4 , 


( 95 ) 





TEST 90S LINKAGE 


243 


(it) M « 3830 

&(4>* | 6) = 4318.79213 + 6397.29625(e* - 1) + 3684321.24073(e‘ - 1)* 

(96) + 1630267^33256(e* - 1)* + 261530062.11111(6' - l) 4 . 

Using the above results Type III curves were fitted to the distribution of 0, 
and approximate values of the power functions fi(w M | 0), at level of significance 
.05, were calculated. This was obtained by evaluating P{0 > Am | 0} and 
assuming the distribution of 4> to be that given by the fitted curve. Then 

(97) - P{<t> > Am | 0}. 

The values obtained for the limiting and approximate power functions are 
given in Tables I la, lib. Unfortunately the agreement between the two is 
not satisfactory. 

Special Case. For the cases Pi = P 2 = \ (M = 100, M = 400) power 
functions were calculated on the assumption that for a given value of 0, the 
random variable 2 M \d 2 + d s ) is distributed normally about a mean M\e 9 — 1) 
with standard deviation \/e 9 {2 — e 9 ). This is approximately the case for the 
values of M considered. The approximate power functions so calculated are 
given in Tables Ilia, Illb. 


7. Parabolic Test and x 2 Test. It is interesting to note the close connection 
between the parabolic test and the x 2 test as introduced for intuitive reasons 
and normally used in testing for linkage. The x 2 test consists of calculating 
the quantity 


1 


(98) 


MP\P 2 {\ - P,)(l - P 2 ) 


{(1 — P0(1 ~ P*)mi 


- P 2 (l - Pi)m% - Pi(l - P 2 )w 8 + PiP 2 m 4 } 2 
and rejecting the hypothesis Ho(6 = 0) if | x I > a where 


<"> JSrL. - 1 -*' 

In the. special case (Pi = Pt — h) the parabolic test and the x* test are iden¬ 
tical; while comparing (98) and (79) we see that in the general case 

(100) u = x- 

Hence in the general case the criterion used in the parabolic test may be 
written 

(101) 4 = v + B\. 

(1) Large Samples. For large samples the first term of the expression v + 
B\ is usually of small importance, since 





N. L. JOHNSON 


S44 

v ifl of form AP* X (linear function of the df s), while 
Bx is of form AT" 1 X (quadratic function of the <2<’«). 

For such samples the x* test and parabolic test would appear to be nearly 
equivalent. 


TABLE II 


Limiting and Approximate Power Functions of Parabolic Test 

Pi = Ft - i 
— » < 0 < 1.386 


Table I la 
M - 100 


Table lib 
M = 3830 


Power 


Power 



Limiting 

Approximate 


Limiting 

Approximate 

-2.00 


0.90870 

-0.25 

0.99932 

0.99853 

-1.50 

0.99880 


-0.20 

0.98502 

0.97521 

-1.40 


0.77656 

-0.15 

0.87243 

0.83620 

-1.20 

0.97915 

0.69505 

-0.10 

0.54197 

0.52066 

-1.05 

0.93786 


-0.05 

0.17827 

0.19223 

-1.00 


0.58580 

0.00 

0.05000 

0.04111 

-0.90 

0.85024 


0.05 

0.17827 

0.21568 

-0.75 

0.70467 

0.42755 

0.10 

0.54197 

0.59517 

-0.60 

0.51532 


0.15 

0.87243 

0.91641 

-0.45 

0.32258 

0.21849 

0.20 

0.98502 

0.99640 

-0.30 

0.16986 

0.12504 

0.25 

0.99932 

0.99999 

-0.15 

0.07905 

0.05689 




-0.10 

0.06280 

0.04438 




-0.05 

0.05318 

0.03866 




0.00 

0.05000 

0.04069 




0.05 

0.05318 

0.05021 




0.10 

0.06280 

0.07429 




0.15 

0.07905 





0.30 

0.16986 

0.26559 




0.45 

0.32258 





0.60 

0.51532 

0.75854 




0.75 

0.70467 

0.94245 




Theorem 4. The limiting power function of the x 

test is 



(102) 0 m (w x ' | fi) - 1-1= [ +,> c -» r*,) - *' * dw 

v2t «*-• 


(w x » denotes the region defined by the inequality \ x I > «)• 

This theorem may be proved by applying (46)-(48) to Qe(x t , x t , x») in 
Theorem 1 of the Appendix, and noting that u * x by (100). 



TEST FOE LINKAGE 


245 


We notice that P m (w x 1 1 t>), for a given value of has the same value for all 
values of M, unlike the limiting power function fi m (w M \ tf) of the parabolic 
test. It is this point which accounts for the seeming paradox that, despite the 
manner in which the parabolic test was defined, for all values of 0 and AT 

(103) | 0) > fijwu | «>) 

as may be deduced from (92) and (102). This does not mean that for any 
given 0 and all M sufficiently large the power function of the x* test, 0it(w x t | tf), 

TABLE III 

Approximate Power Function 
Pi - Pt - i 
-°c < e < 0.693 

Table Ilia. Table Illb. 


M 

= 100 

M 

= 400 

9 

Power 

9 

Power 

-0.45 

0.96288 

-0.25 

0.99424 

-0.40 

0.92161 

-0.20 

0.95482 

-0.35 

0.85072 

-0.15 

0.79787 

-0.30 

0.74351 

-0.10 

0.47734 

-0.25 

0.60197 

-0.05 

0.16378 

-0.20 

0.44054 

-0.02 

0.06810 

-0.15 

0.28380 

0.00 

0.05000 

-0.10 

0.15727 

0.02 

0.06885 

-0.05 

0.07737 

0.05 

0.17609 

0.00 

0.05000 

0.10 

0.55737 

0.05 

0.08029 

0.15 

0.90213 

0.10 

0.18177 

0.20 

0.99431 

0.15 

0.36464 

0.25 

0.99995 

0.20 

0.60278 



0.25 

0.82071 



0.30 

0.94975 



0.35 

0.99299 




is necessarily not less than the power function of the parabolic test, /3*(wjf | 0). 
For although, given any c > 0, there is a number M,,t such that if Af > M,,t 

(104) | fiu(w x * | d) - $ m (w x t 1 1 >) | < « 
and 

(105) | 0 | *) - 0 m (w M | d) | < « 
it may be that for such values of M,,» 

(106) 0 < 0 m (w x t | tf) - 0jtv M 11?) < 2e. 



246 


N. L. JOHNSON 


The above results show, however, how close the agreement/ between the power 
functions of the two tests is for large values of M . In fact we have 

(107) lim | «>) = 0„(w x i | d). 

AS-»oe 


This may be easily proved, since as M increases w M approximates to w x t. 

(2) Small Samples. In order to obtain some idea of the relations between 
the two tests when M is small (i.e. less than 100), the case Pi = Pj — 1, M = 32 
was considered in some detail. 

In this case our tests at 5% level of significance arc respectively 
X 2 test, reject if 

(108) | 2y - 1 1 > 8.315 
parabolic test, reject if 

(109) (2y - z) 2 - 1(2 y + z) > 69.570 
where 

(110) y = di z = dt + d». 

All samples for which the verdicts of the two above tests would not agree 
were obtained. These were as follows: 

(a) Samples for which Ho is accepted by x* test, rejected by parabolic test 


Probability of drawing sample of this type; 
when Ho is t rue is 0.00320. 


( b ) Samples for which Ho is rejected by parabolic test,, accepted by \ test 

— 10 1 2 3 5678899 Probability of drawing sample 

. .... of this type when H 0 is true is 

9 11 13 15 1 3 5 6 7 8 9 0.00038. 

Thus the probability of the two tests giving different, verdicts when H 0 is in 
fact true is only 0.00358. 

It will be noted that the above results imply that 

(111) 0«(«/ J2 1 0) - fin(w x t \ 0) - 0.00320 - 0.00038 = 0.00282; 

i.e. that the true levels of significance of the two tests are not equal. This is 
to be expected, because of the discontinuity of the probability distribution of 
sample points, which makes it unlikely that the level of significance of either 
test is exactly .05. 

Similarly we can obtain values of | 6) — 0 M ( w x t \ 8), the differences in 

the powers of the two teats with respect to various alternative hypotheses. 
These values were obtained for a few values of 6. 



TEST FOR LINKAGE 


247 



0 

fin(wn | 0 ) — 0 n { w x t \ 0 ) 


-0.5 

0.01625 


0.0 

0.00282 

• 

0.5 

-0.00006 


These figures indicate that the parabolic test detects negative 0’s better than 
the x 2 test, but that the x 2 test detects positive 0’s better than the parabolic 
test, although the advantage in this latter case is minute. 

The critical regions associated with the two tests may be represented by 
regions in the ( y , z) plane. The critical region for the parabolic test will be 
defined by 

(112) ( 2y - zf - 1(2 y + z) > v 
and that for the x 2 test, w x %, by 

(113) (2 y - zf > v' 
where v — v\ 

w x t is therefore the complement of the region lying between the lines Li , L a 
with equations 2y — z = ±y/v'\w M lies outside the parabola K with equation 
(2 y - zf - |(2y + z) = v. 

Since v =* v\ K meets L \, L 2 at points near the respective intersections of 
L \, L 2 with the line 2y + z = 0. See Figure 1. 

In the diagram the regions TV, F 2 contain all sample points for which the 
X 2 test rejects and the parabolic test accepts Ho ; Ui , l\ contain all sample 
points for which the x 2 test accepts and the parabolic test rejects Ho . 

For a given value of 0 it is known that the probability distribution is approxi¬ 
mately such that the quantity 

= \y ~ *M(e' - l)} 2 , [z + \M{e* - l)| s 

(114) + _1) - iW-1) 

, 1 V + z + AM(e* - l)} 2 
+ 1) 

is distributed as x 2 with 2 degrees of freedom. 

The ellipses of equal density = constant have centers at points — 1], 

— $M[e — 1]) which must lie on the line 2y + z = 0. When 5 = 0 the center 
is at the origin, and the major and minor axes of the ellipse make angles of 
approximately 99.5° and 9.5° respectively with the y-axis. For small changes 
in 0 the angles of inclination of the major and minor axes of the ellipse to the 
coordinate axes are not greatly changed, and we see that as the center of the 
ellipse moves along the line 2y + z = 0 we have 

{1)0 increasing: center moves downwards, tending to increase P{E tU t ] — 
\E t V t ] while P\E t F,) and P\E * FiJ both become small. Thus 0 M {w M \ 0) 
tends to increase quicker than 0 *(«**• | 0). 



248 


N. L. JOHNSON 


(£) 0 decreasing : here we have the opposite effect and | 0) tends to 

increase slower than 0u(‘W x « | 0). 

These conclusions agree qualitatively with those drawn in the case M * 32. 
(N.B. In the case M * 32 no sample points fall into the region U i because no 
points in U\ satisfy the inequalities (25), (26)). 


8. Some Geometrical Considerations. In this section we shall consider the 
manner in which the situations dealt with above may be interpreted in terms 


z 



of geometrical concepts. It will be convenient to consider as variables n< — 
nii/M. The sample space W ( n ) is then bounded by the four planes 


(115) 


Ui m 0 (t - 1, 2, 3), 

Urn - 1 . 

<-i 


In this space, corresponding to any admissible hypothesis Hi specifying a 
value of 0, there is a point T# with coordinates (0 ni , 0"*, 0"') where 

0 "‘ = PtPte’, 

0 n * - Pi(l - P*e’), 

0"' - P,(l- P/). 


(116) 


TEST FOE LINKAGE 


249 


Those are the proportions of results expected in the first three cells, if the 
hypothesis II e specifying 6 be true. 

Now, if He be true, we have 

(117) .Pint — n[ , n 2 — n 2 , n* = nj , n 4 = n[ | lie] — 


where c is constant for a fixed sample size ilf, and 

[SJ 


(M8) 


x. 

M 


r 


3 

I 

t-1 


i 


Hence the most frequent position(s) of the sample point E will be some¬ 
where near the point To, which I shall therefore call the center of density. It 
will be noticed that, whatever be the value of 6, the point To must lie on the line 

Hi - P,P. 2 = -[w* - Pi( 1 ~ P*)] = -[«. - P*(l - Pi)]. 

This line, a segment of which is the locus of the center of density for our set of 
admissible hypotheses, will be called the line of density. 

In this space tin* parabolic tost corresponds to a critical region comprising the 
exterior of a parabolic cylinder. The equation of the boundary of this critical 
region at level of significance .05 was found for the case Pi = P 2 = \, and a 
model made of it. Also included in the model were the ellipsoids 

(120) x* = k m 

where A'.o& is a constant so chosen that 

(121) e\x»> K m | 0| - .05 

corresponding to 

(/’) the case when // () is true 
(ii) the cases when 

(122) (a) Jh = A; P 2 = Jh = ‘A; Pi = sf ie. 9 = 0.41 

(123) (b) Pi = 32; 7'a = Ps = 3 7 ^; P4 = is be. 0 = -0.69. 

It was found that in the case Pi = P« = J one axis of all the xr^llipsoids 
was perpendicular to the plane 1 through the line of density and thc^ axis of /? t . 
The generators of tin* boundary of the parabolic acceptance region are also 
perpendicular to this plane. (By “acceptance region” is meant the complement 
of the critical region. The acceptance region may be written symbolically 
w m .) There were further added to the model the intersections with this plane 
of the ellipsoids at probability level .01, corresponding to the three 4 hypotheses 
considered above (6 — 0, 0.11, —0.69) and two others, viz. 

(124) Pi = A; ih = p3 = A; p* = U i-e. 0 = 0 . 92 , 

(125) pi = ; p 2 = p3 = Jf; P4 = 8? i.e. 0 = -1.39. 



250 


N. L. JOHNSON 


For convenience in making the model to a simple scale (1 unit « 150 cms.) it 
was found necessary to take the sample size M as 1312.5. The model is shown 
in Figure 2. It will be seen that the acceptance region for the parabolic test 
is approximately enclosed between two parallel planes perpendicular to the 
plane common to the line of density and the axis of n i. These two planes, in 
fact, enclose the acceptance region for the % test. The vertex of the normal 



Fie. 2 


parabolic section of the parabolic acceptance region is at a comparatively great, 
distance “below” the plane n\ = 0. 

As an interesting digression we may use our model to compare qualitatively 
the parabolic test, with yet a third possible test of II 0 . This test is to reject 
Ho at level of significance .05 if 

(126) X? > Km 

and may be called the xl test. The xo-ellipsoid shown in the model is the ao- 
ceptance region for this test. It will be noticed that when 6 ^ 0 the ellipsoids 




TEST FOB LINKAGE 291 

of equal density include somewhat more of the acceptance region of thex* test 
than of the parabolic acceptance region. This means that the x« test would 
detect that the hypothesis Ho(8 * 0) is false in those cases, less frequently than 
would the parabolic and x 1 tests. We also notice that the center of densitjr 
7# leaves the parabolic acceptance region before it leaves the acceptance region 
of the x* test as it moves along the line of density from the point where 6 « 0, 
whether the direction of motion of T# corresponds to 8 increasng or decreasing. 
This also indicates that the x* test would act less efficiently than the other 
two tests. 


9. Appendix. In this appendix are obtained various results which, while 
essential to the main argument, would appear as digressions if they were inter¬ 
polated as required. The numbering of equations in this appendix does not 
continue from that of the previous sections, but forms a separate group. 

Lemma. If Mm), Mm), • • • , f.(m) be (s + 1) functions of the k variables 
m it mt, ■ ■ ■ ,m k which are zero except for a finite number of sets of integral values 
of mi, • • • ,m k ; and if Wo be a region in the space of m’s such that 

S 

(1) ft(m) > 52 Oifi(m) in wo 

<-i 

(2) f 0 (m) < 52 Oifiim) in ©o 

<*i, at, • • • , a k being arbitrary constants; then if w be. any region such that 

52 Mm) = 52 /<(«») (* - 1, • • •, s), 

10 10 Q 

22/o(m) < 52/o(m), 

10 «0o 

6 = 22 fo(m) - 52 Mm) 

100 V 

= 2 /o(ro) - 52 fo(m) 

100—10100 10—10100 

where wwo denotes the common part of w and too • 

Hence the region w — totoo, consisting of those points of to which are not in 
totoo, and so not in too, is contained in ©o. Similarly the region too — unco is 
contained in too • Hence, by inequalities (1), 

« > 52 152 a</,(m)} - 52 {S2 «./<("») 

10 0—1010© ) 10—10100 

« > 22|lj «</<(»»)} - 22 jlj 


(3) 

we shall have 

(4) 

Proof. Let 

( 5 ) 


( 6 ) 

and so 
(7) 



252 N. L. JOHNSON 

* 

Since the total number of terms in each double summation is finite, we have 

(8) 5 > £ a t - £/<(m)}. 

1*1 WO w 

But 

(9) 2 Mm) = 2 2 Mm), (t = 1, • • • , «)• 

Wq tv 

Hence 

8 > 0, and 2/o(w) < X Mm). 

w w o 

A lemma similar to the lemma above, where the /’s are taken to be integrable 
functions and summation over the regions w, w 0 is replaced by integration over 
these regions, is given by Neyman and Pearson [9]. The proof given above 
follows the lines of the proof given in that paper. 

Theorem 1. Suppose that, in a quadrinomial population : 

(i) the cell probabilities are dependent on the number M of trials made, and are 
given by 

Pi = Poi + <Pm 
P i = Pvt ~ <Pm 

(10) 

P» = Pos — <f>u 
Pi = Poi + <Pm 

where 

(11) 2 Poi = X Pi = 1 

i-i <-i 

and 

( 12 ) - 1 ) 

(«) 

(13) Xi = (m,- - Mp 0 >)/(Mpvi )* (t = 1, 2, 3, 4) 

where = number of remits falling in i-th cell. 

(Hi) w(x), or briefly w } is a region in the space W of x 1 , x 2 , x 3 ; and P M (w) 
is the integral probability law of w corresponding to the values Pi, p* , Ps, Pi of 
the cell probabilities given in (2) above when we have M independent trials. 

Then 


p ' w ~ ^/// 


(14) 



TEST FOB LINKAGE 


258 


uniformly over W as M —*■ », where 

3 

Q«(xi, xt, xj) = 2 <(1 + Po.Pm) + 2 pot 2 xaiipoiPv)' 

«—1 *</ j £8 

(15) — 2Xt?{xi(poi 1 ~ P 02 P 04 ) — ^(poi 1 + pwpoi 1 ) 

4 

” ^(pos 1 + PwP04 l ) j + X 2 ^ 2 2 Po< 1 . 

4-1 

This theorem may be proved by the same method as that used by F. N. 
David [2] in proving the generalized theorem of Laplace. 

1 would like to thank Professor Neyman for his invaluable suggestions and 
advice in the preparation of this paper. 

REFERENCES 

[1] Joseph F. Daly, Annals of Math. Slat., Vol. 11 (1940), p. 1. 

[2] F. N. David, Stat. Res. Mem., Vol. 2 (1938), p. 69. 

[3] R. A. Fisher, Statistical Methods for Research Workers , 6th. ed. (1936), Oliver and 

Boyd, London. 

[4] J. Neyman, Phil Trans., A., Vol. 236 (1937), p. 333. 

[6] J. Neyman, Skand. Actnar. Tidskr., Vol. 20 (1937), p. 149. 

[61 J. Neyman, Ann. Math. Stat., Vol. 9 (1938), p. 69. 

[7] J. Neyman and E. S. Pearson, Biometrika, Vol. 20A (1928), p. 223. 

[8] J. Neyman and E. S. Pearson, Phil. Trans., A, Vol. 231 (1933), p. 289. 

[9J J. Neyman and E. S. Pearson, Stat. Res. Mem., Vol. 1 (1936), p. 1. 

[10] J. Neyman and E. S. Pearson, Stat. Res. Mem., Vol. 2 (1938), p. 26. 


Department of Statistics, 
University College, 
London, England 



REDUCTION OF A CERTAIN CLASS OF COMPOSITE 
STATISTICAL HYPOTHESES 

By George W. Brown 

1. Introduction. A situation frequently met in sampling theory is the fol¬ 
lowing: x has distribution f(x, 0), where 0 is an unknown parameter, and for 
samples (x \, • • • , x n ) there exists in the sample space E n a family of (n — 1)- 
dimensional manifolds upon each of which the distribution is independent of 
0; in addition there is a residual one-dimensional manifold available for estimat¬ 
ing 0 . For example, suppose there exists a sufficient statistic T for 0, then on 
the manifolds T = To there is defined an induced distribution which is inde¬ 
pendent of the parameter. 

A similar situation is observed when 0 is a “location” or “scale” parameter. 
Let x have the distribution f(x — a) for some a, then the set (x 2 — — 

Xi , • • • , x n — £i), or any equivalent set, such as , x n — £), have a 

joint distribution independent of a, and there is a residual distribution corre¬ 
sponding to each particular configuration (x 2 — £i, • • • , x n — X\). Fisher 
[1] and Pitman [5] have examined the residual distributions in connection with 
the problem of estimating scale and location parameters. In this paper we 
shall be concerned primarily, not with the residual distribution, but with the 
remainder of the sample information, corresponding to the (n — 1 )-dimensional 
distribution which is independent of the parameter. It is found, in a rather 
broad class of distributions, that the part of the sample not used for estimation 
determines , except for the parameter value , the original functional form of the 
distribution of x. 

This paper is devoted mainly to a study of particular classes of distributions 
having the property mentioned above. We consider also the theoretical appli¬ 
cation of this property to certain types of composite hypotheses which may be 
reduced thereby to equivalent simple hypotheses. 1 The principal results of this 
nature may be summed up as follows: If x has distribution of the form/(x, 0), 
where 0 is either a location or scale parameter, or a vector denoting both, then 
there exists, in samples {x\ , • • • , x n ) a set of functions yi(x i, • • • , x n ), i = 
1, 2, • •• , p, p < n, having joint distribution D{y\ , • • • , y p ) independent of 0, 
and such that the converse statement holds, namely, if \yi) have the distribution 
D(yi > • * * > Vv)i then x has, for some, 0, a distribution of the form f(x, 0). There 
is a corresponding statement when x has a distribution of the form f(x — 20,-u,), 
where the (a,) are parameters, and the [ui] are regression variables. 

1 Wo use the terms simple and composite hypotheses in the sense of Neyman and 
Pearson [2]. * 


254 



COMPOSITE HYPOTHESES 


255 


2. Location and Scale. This section is devoted to the study of functions of 
the sample observations which are such that their distributions determine the 
distribution of x, except possibly for location and scale. 

It will be assumed that associated with x there is a function F(x) such that 

(a) Fix) is monotone non-decreasing, 

(b) F(— oo) = lim Fix) = 0, and (c) F(«) = lim Fix) = 1 

*-*—ao x~*tO 

with the normalization F(x) upper semi-continuous. F(x) is the probability 
that the random variate takes a value less than or equal to x. If Fix) is as¬ 
sociated with the random variate x we say that x has the distribution F(x). 
If g(x) is a Borel-measurable function, the Lebesgue-Stieltjes integral 

g(x) dF(x) is denoted by E[g(x)]. 

The characteristic function <p(t) = E(e"’) determines F(x), that is, if 
f e“*dG(x) = f e itz dF(x), then F(x) = G(x). 

tL—go OO 


Similarly, let F(x i, • • • , x*) be such that 

(o) F(x,, • • • , n-i , Xi + h, , • • • , x k ) > F(x i , • • • , x<, • •• , x k ) for 
h > 0 and i = 1, 2, • • • , k; 

(6) lim F(x i, • • • , Xk) = 0, i = 1, 2, • •., k; 

(c) lim F(xi, • • • , Xk) = 1; 

*i.' • 

with the normalization F(xi, • ■ ■ , x k ) continuous on the right, in each x,-. If 
F{x i, • • • , Xk) is associated with xi , ■ • • , x* we say that Xi, • • • , x* have the 

joint distribution F(x i, • • • , x*). As before, E[H(xi , • • • , x*)] — \ H dF, 

Jh k 

where Rk is the Euclidean k- space. It. is well known that under such condi¬ 
tions, given Borel-measurable functions y<(x t , ■ • ■ , x*), i — 1, • • • , p, p < k, 

then G(yi, ■ • ■, y P ) = / dF(x i, • • •, x k ), where R{y) is the region [yi(xi, • • •, 

Xk) < y i, • • • , y p (xi , • • • , Xk) < y v ], is again a distribution function satisfying 

the conditions above. Moreover, / g(y t , • • • , y p ) dG (j/i, • • • , y P ) = 

Jr 


I fflViixi , • • • , Xk), • • • . y p (x i, • • • , x*)] dF, where R' is the set of all points 

(xi, • • • , x k ) such that [ 2/1 (xi, • • • , x k ), • • • , y P (xi , • • • , x*)] t R. 

If x has distribution F(x), then, by definition, the set (xi, • • • , x„) is a sample 
from this distribution if x 1 , • ■ • , x„ have the joint distribution F(x 1 ) • • • F(x„). 

The following theorem states that two distributions giving rise, in sampling, 
to the same distribution of the set Xi — x„, x* — x„ , • • • , x„_i — x„, with 
n > 3, can differ at most by a translation, that is, the distribution of that set 
determines the original distribution except for location. 

Theorem Ia: Let r have the distribution F(x). Denote by S the set of zeros of 



256 


GEORGE W. BROWN 


J c xt * dF(x) and denote by t the g.l.b. of | 1 1 for t in S. Suppose that the comple¬ 
ment of S is t-conneded . 2 Suppose that x' has distribution G{x'), and letxi, • ••,*» 
and x[ ,•••, x n be samples. Then the set w a — x a — x n , a = 1, • • • , n — 1, 
have the, same joint distribution as the set w' a — x' a — x' n if and only if there exists 
a constant a such that x' + a and x have the same distribution. 

Proof: The sufficiency of the condition follows immediately, since w' tt = 

x'a - Xn - {Xa + d) - {x' n + o). 

In establishing necessity, only the fact that Wi , u>* have the same joint dis¬ 
tribution as w[ , to* is needed. This hypothesis implies that 

that is, 

Set <p(t) = E(e' lx ), \p{l) = E(e <tx ). The relation above becomes 

(1) <p(tMt*)<p{- fa - fa) = U - fa). 

Consider equation (1) for values of fa, fa in the neighborhood of t = 0. ^(0) = 
^(0) = 1, hence there is an interval | 1 1 < 5, in which <p(t) and f(t) do not 
vanish. It is easily shown that <p(t) and ^(<) arc each continuous, since e' u , in 
the neighborhood of t = 0, is continuous uniformly for any bounded interval 
of x, and since A may be chosen so that 1 — F(A) and F{ — A) are both as small 
as desired. In the interval | 1 1 < 5 the function /(<) = <p{i)/f/{t) is continuous. 
Also, <p(—t) — <p{t) and ^(— t) = 'p(t). Setting fa = 0 in (1) we obtain 
<f >{hence | <p{t) | = | yp(t) |, that is, | f(t) \ = 1. f(t) takes 
values on the unit circle of the complex plane, and /(0) = 1, hence there is an 
interval 1 1 \ < S' such that z = f(t) lies on an arc y, of length less than 2ir, 
containing the point z — 1. Now consider the functional equation (1) for 
| fa | < $6', | fa | < hS'. (1) becomes 

/(fa)/(fa)/(- fa - fa) = I- 

The interval 1 1 1 < S' was so chosen that for | fa | < i$', | fa | < $5', it is possible 
to define a single-valued branch of the argument of /(fa), /(fa), and /(fa + fa). 
Letting fa = 0 we have/(<)/( —<) = 1, hence, replacing/(— fa — fa) by l//(fa + 
fa) in the last equation, we have 

/(fa)/(fa) = /(fa + fa). 

Arg /(fa), arg /(fa), and arg /(fa + fa) are uniquely determined, except for some 
fixed multiple of 2 t. If we choose the principal value of the argument, i.e., so 

* The set S is c-oonneeted if any two points p, q, in S can be connected by an eohain, 

1. e., there exists a set po “ p, Pi, • • • , P»- 1 , p» - q, such that | p< — p,_i | < «, t - 1, 

2. ••• , n. 



COMPOSITE HYPOTHESES 


267 


that 0 < arg /(() < 2r, we must have 

arg/(<i) + arg /(<,) = arg/(fi + U) 


for | | < W> | U | < i$'. Since arg f(t) is continuous, any solution of this well 

known functional equation must be of the form arg f(t) = at. |/(<) | = 1, 
therefore there exists a constant a such that /(() = e“‘, for 1 1 1 < $5', that is, 
<p(t) = e“V(0> for | £ | < £5'. By use of (1) this may be extended to hold for 
all t such that | f | < «, where«is the minimum modulus of all t such that <p(f) — 0. 
(1) may now be used to extend the relation for all t such that <p(t) 0 by choos¬ 

ing an echain connecting the origin to the point <. We know already that 
= e’“ty(t) if <p(t) — 0, hence it holds for all t. This relation says that 
E(e' tz ) = E(e tH * +o) ), hence x' + a and x have the same distribution, thus 
completing the demonstration of the theorem. 

It should be remarked that the set (xi — x«, • • • , x n _i — x„) may be replaced 
in Theorem la by any equivalent set, for example, (xi — £,■■■, x B _ x — x). 

The next result is of the same nature as Theorem la except for the replace¬ 
ment of the location parameter by a scale (positive or negative) parameter. 


Theorem 1b : Let x have distribution F(x ), such that the zeros 



e «l(loc|z|) 


dF(x) 


are nowhere dense, and let x' have distribution G(x'). Let xi, • • • , x„ and 
x [, • • • , x' n be samples from the distributions of x and x', with » > 3, then the set 
w a = xjx n ,o = l, •••,»— 1, have the same distribution as the set w a = 
x'a/x' n if and only if there exists a constant c such that ex' and x have the same 
distribution. 

Proof: The sufficiency of the condition is evident. Suppose, then, as before, 
that u>i, w t have the same joint distribution as w[ , w% . Log | w x | and log | w 5 1 
have the same joint distribution as log | w[ | and log | w* |, hence by application 
of Theorem la to log | x | and log | x' | it follows (since the complement of a 
nowhere dense set is econnccted for every t) that there exists a constant a such 


that 


£ e utatM dF(r) = f 


it [log [jr'I-a] 


dG(x). 


Let y — c”“x', then | x | and | y \ have the same distribution, and 
(2) j e u lotw dF(x) = f e ilImM dH(y), 


where y lias distribution H(y). We now have to show that either y or — y has 
the distribution of x, that is, it must be shown that either H(y) = F(y), or 
H(y) = 1 - F(-y). 

By the first part of the theorem the functions Ui = yi/y* and u% = yi/yi have 
the same joint distribution as Wi , w%. It is clear that the mean value of any 
function of w t and «j is the same as the mean value of the corresponding func- 



258 


GEORGE W. BROWN 


tion of wi and u>». Hence 

/// e ’“‘ l0 * 101 sgn wi sgn wt dF(x x ) dF(xt) dF(x») 

III e n tl icM +h 1«U,li agn Mi egn UtdH( yi )dH(yi)dH(y,), 


where sgn x = 1, for x > 0, sgn x = — 1 for x < 0. 

(sgn Wi)(sgn to») = (sgn xO(sgn x s ), 
so that the last equation becomes 


lH e' l ‘ l (U * l * 1 '- lo * |r,l)+ “ (1 °* l **'- l0 * l * ,m sgn x, sgn x, dF(x,) dF(xt) dF{x») 


(3) 



e ilt i (log I viI— log |yj|)+fs (log Ivil- log |y|!)l 


Sgn yi 


*1 (t) - / e u lo * l ‘ l dF(x); 

Ht) - / e“ '° g 1x1 sgn xdF(x); 


X sgn y t dH(yi) dH(y t ) dH(y»). 

*i{t) = / e" 1 "'•'<»(¥) 
w«) = / e‘“°‘ l1 ' 1 sgn ydH(y). 


From (3) we have Mk)'h(k)'h(- k - k) = <fit(k)<f>t(k)<fii(- k - k) for all 
ti, k , and from (2) we have (<) - <pi(t) for all t, hence, if ^i(— k — k) 9* 0, 
'l'i(k)'t't(k) = <f>t(k)<pt(k)- By hypothesis the zeros of \pi(t) are nowhere dense, 
hence if 4>i{— k — k) — 0 there is a sequence < <n> , such that t (n) — k — k 
and ih(t {n) ) 9* 0. Now take an arbitrary sequence /i n) such that <!"’ —» k , 
then <* B) = — t (n) — tl n) must tend to k . For each n we have ^*(<i’ ,> )^*(t* n) ) = 
v>*(fi B) Vj(f* n) )- All the functions appearing are continuous, thus we see that 
'l'%(k)'l'*(k) = <Pn(k)<fit(k) for all k , k ■ From this it follows directly that either 
*(Q = for all t or rpi(t) = —<Pi(t) for all t. We have* 


m = j[V io ‘*dF(x) + < 

[\ i,lot{ - x) dF(x) 

r* 

pO 

U0 - / e“ lotx dF(x) - 

[ e <,,0,( ~* ) dF(x) 


1 The assumption has been made implicitly that F(x ) and G(x) are continuous at * — 0, 
otherwise the distribution of z,;/*„ is not properly defined, and the functions^<(0 and 
are then not defined. Similar assumptions will be made whenever necessary in later 
theorems. 



COMPOSITE HYPOTHESES 


258 


<Pi(t) - jf e u "*'dH(x) + £ e“ l0,( “* ) dtf(x) 

and *(f) - jf c“ lot# dH(x) - £ e“<-*> dH{x). 

Combining these expressions with the relations obtained above leads, by Fourier 
inversion, to the result that either F(x) m H(x) or H(x) m 1 — F(—x). We 
have shown that either y or — y has the same distribution as x, that is, either 
e"V or — <TV has the same distribution as x. 

Theorem' lb states essentially that the joint distribution of the set x./x„, 
a=l, •••,« — 1, determines the distribution of x except for a scale parameter 
and possibly a reflection. In the event that x has an asymmetrical distribution, 
and if it is desired to rule out negative changes of scale, a variation of this pro¬ 
cedure is necessary. The next result is appropriate for this situation. 

Theorem Ic: Let x have distribution F(x) such that the zeros of J e’ ,lo,ul dF(x) 

are nowhere dense, and let x' have distribution G(x'). Let Xi , • • • , x n and 
x [, • • • , x' n be samples from the distributions of x and x', with n > 3. Express 
Xi, • ■ • , x n and x [, • • • , x\ in spherical coordinates 

Xi — r cos 0i, x[ = r' cos d[ 

Xt — r sin 0i cos 0*, r* = r' sin 0( cos 0* 

• • 

• • 

• • 

x B = r sin 0i sin 0* • • • sin 0 n -i, x* = r' sin d[ sin 0* • • • sin 0«_i. 

Then Ox , • •• , 0„_i have the same joint distribution as o [, • • •, 0»_i if and only 
if there exists a positive constant k such that kx' and x have the same distribution. 

Proof: Sufficiency of the condition is an immediate consequence of the fact 
that 0j, • • • , 0„_i are invariant under the transformation x = kx', with k > 0. 
If 0i, • • • , 0n—i have the same joint distribution as d [, • ■ • , 0»_i then the set 
(x„/xn| have the same joint distribution as the set [x'jx «}, hence, by Theorem 
lb, there exists a constant c such that ex' has the same distribution as x. To 
establish necessity of the condition we must show that | c | x' has the same 
distribution as x. 

Set y = | c | x', and let yi, • • • , y n be expressed in spherical coordinates; 
yi , • • • , y n have the same angular coordinates 0i, • • • , 0»_i. This implies 
that xi/r and xj/r have the same joint distribution as yx/R and yt/R, where 

R VyT+ • • • + 7n ; ?/ = Xi/| Xt I, therefore x t /| x* | has the same dis¬ 

tribution as j/i/| yt J, so that 

// e‘ sgn dF(xx) dF(x,) - // sgn dH( Vl ) dH(y t ) 



260 


GEORGE W. BROWN 


if y has distribution H(y). 
yields 



sgn x \, so that the last equation 


f e u ,0 * 1,1 sgn * dF(x). e~ u I# * dF(x) 

= j[“ e“ 101 1x1 sgn * dff (s). £ Io « 1x1 


We know already that | x | and | y | have the same distribution, so that 

(4) j[" e il ,0 ‘ 1x1 dF(x) = j[" c" ,og 1x1 rf//(a:), 
thus 

(5) j[" ,0 « 1x1 sgn x dF{x) - j”° e" ,0,1x1 sgn a: cU/(x), 

r 00 

except possibly for zeros of / e ~ tMog,x| dF(x). By hypothesis the exceptional 

J — 00 

points are nowhere dense, so that, by continuity, (5) holds for all t, (4) and 
(5) together imply, as in the proof of Theorem lb, that F(x) as H(x ), i.e., x and 
\c\x' have the same distribution. 

The next three results are generalizations of Theorems la, b, c, to analogous 
multivariate situations. The first of these is a direct generalization of 
Theorem la. 

Theorem IIa: Let x \, • • • , x k have joint distribution F(x i , . • * , x k ) such 

that the complement of the set S of zeros of J e* xtrXr dF(x i, * • • , x k ) is t-connectcd, 

where c is the g.l.b. of | 1 1 for (t) in 5, and let y \, • • • , y k have joint distribution 
G(Vi , * • • , Vk ). Let (xi , • • • , x k ) and (yf , . • • , y k ), a = 1, •. • , n, be samples 
from these distributions , with n > 3. Then w ip = xf — x? , i = 1, • • • , fc, 
0 = 1, • ■ • , n — 1, have the same joint distribution as the corresponding set v# = 
— 2/? and only if there exist constants *»!,-•• , a* swc/i Ma/ , • • • , 

y k + a k have the same joint distribution as X \, • • • , x k . 

Proof: Set 



• • • , 4 ) — J 

f e i i trS 'dF(x 1 , ... 

■, **), 

Ml) 

• ‘ »fc) = J 

f do<&,... 

• >!/*)■ 


If w #, i = 1, • •• , fc, j8 = 1,2, have the same joint distribution as v# , then, 
as in the proof of Theorem la, we have 

<p(t\ 1 , * * * , tk\)*p(t\2 , * * ' , tk*)v( — 4l ~ t\* > • * • i — 4l — 42) 

= Mil t • • • ; 4l)Ml* , • • * ; 4*)lK — 4l — 4* 4l — fa). 


(6) 



COMPOSITE HYPOTHESES 


261 


Again, as before, \v\ * | ^ |; Ah ,•••,<*) and Ah , • • • , fo) are continuous; 
^»(0, 0, • • • , 0) * ^(0, 0, • • • , 0) — 1. There will exist a neighborhood N of 
(0, 0, • • • , 0) such that for (h , • • • , t k ) tN the function f(ti , • • • , t k ) ** 

iJi?- is defined and continuous. Then there will exist a neighborhood 
Ah , •••,$*) 

N' <Z N such that in N f there exists a uniquely determined branch of 
arg f(h , • • • , 4), continuous in N\ and such that if (t x , - • • , t k ) *N' and 
(ui , • • • , Uk) € JV' then arg /(Ji + tq , • • • , t k + u k ) is also uniquely determined 
and continuous. For (t) e N f and (u) < N', arg / satisfies the relation 

arg f(h , • • • , t h ) + arg f(u x , - • • , u k ) « arg /(^i + Ui, . • • , t h + u k ). 

It is easily shown that any continuous function satisfying the equation above 
must be of the form 2a r tr , therefore 

(7) At i, ••• tk) = •••,**); (0«iW. 

Just as in the proof of la the relation (7) may be extended, by use of (6), to 
hold for all t. This implies, finally, that the set {+ a*} have the same 
joint distribution as the set {Xi|. 

Theorem lib is a generalization of Theorem lb to multivariate distributions. 
Theorem IIb: Let X \, • • • , x k have distribution F(x i, • • • , x k ) such that the 

zeros of J e tXtr Ios,Xrl d^Fixi , • • ■,£*) are nowhere dense , and letyi, • • • , y k have 


distributionG(y x , • • • , y k ). Let (x% , • • • , x k ) and (p? , • • • , y k ) f a == 1, . • - , n, 
6e samples, with n > 3. Then the set w # — rrf/s? , t = 1, • • • , k, 0 = 1, • • , 
n — 1, have the same joint distribution as the corresponding set v# = p?/p? tf 
and only if there exist constants c x , • • • , c* such that the set c#, the same 
distribution as the Xi. 

Proof: The demonstration is parallel to that of Theorem lb. By Theorem 
Ila there exist a x , • • • , a k such that 


E(e 


iZt r log U r I 


jj^*Z< r (log U r |+a r 


)• 


Set z r = e ar y r , then 

(8) / e iUr dF(xi, • • •, x k ) - j e <Zh lot |,fl rf/7( Zl 


where (zi, • • • , z*) have distribution function //(zi, • • , z*). 

We shall continue the proof from here under the assumption that k = 2. 
It will be evident how the proof goes for any 4. We have, since z?/zj have the 
same joint distribution as x? r /x \, 

00 

IJJ g<2i,0(io. i«*i—io*i*ji) sgn *J) dF(x?, xj) dF{x\ , xj) 

(9) ”* 

00 

-/// Sgn sgn (!) d//(xj, xj) dH(x\, xl) dH(x\, xl). 



262 


GEOROE W. BROWN 


Both members of (9) are evaluated as products, just as was done in previous 
proofs, and from the result, combined with (8), we conclude, as in Theorem lb, 
that 

00 00 

J J S g n Xl dF(xt , xt ) ~ s i J J e’ 2<rl0 * , * ,l sgn xidH(xi, x t ), 

—00 —00 

where Si = ±1, for all (h , tt). Similarly 

•0 00 ^ 

// g* 2<r *°* '* rl sgn x t dF( Xl , Xt) = Si JJ e iX,rio ' M sgn XtdH(xi, xt) 

—00 

and 

j J e »* x *r i* r i S g n Xl S gn Xi dF(xi, xt) = s 3 J f c ,2 ‘ rlo * l * fl 8 gnxxsgn*s dH(xi,Xt), 
—00 —00 

with S 2 = ±1, #3 = ±1. 

00 

Set ifiih, t t ) = J J e iS ‘ r lofllrl sgn x y dF(x u xt) 

_oo 

oo 

, < 2 ) = f f e <Ztr 101 Url sgn z* , Xt) 

—00 

00 

•PniU, U) = J J e' 1,rlotUri sgn Xi sgn x t dF(.x,, x t ) 


and let , * 2 ), fa(h , £ 2 ), and , k) denote the corresponding transforms 
of H(x 1 , x 2 ). We have 

f ^ 1 (^ 1 , k) ~ Sx^iOi, fe) 


( 10 ) 


^2^1 , fe) = S2\h(lly k) 


with si = dbl, «2 = ± 1 , and S 3 = ± 1 . 


Now, as in (9), by considering ^^ , ' 2, " ,aoill ? l “ lo ‘ |l ’ l) 
obtain the relation 



y kl)*p2(ti2 , £ 22 )^ 12 ( kl — ^12 , — ^21 — kl) 

= chilli j kl)Mtl2 f ^22)^12(~ fll — ^12 , — ^11 — 
showing that $i , s 2 , 63 , may be chosen so that 1 , that is, $ 1*2 = S 3 . 



COMPOSITE HYPOTHESES 


Consider now the variates i T = s^ T ,r — 1, 2. Let K(z{, z*) be the distribu¬ 
tion function of z[, z£. If we let di(h , i t ), 0 s (h , k), and 0 u (fi, <*) he the trans¬ 
forms of K which correspond to v»i(h , h), <pt(h , ti), and <pu(ti, t%) respectively, 
it is evident that 

> h) = 0i(fi, t%) 

( 11 ) , k) = 6i(ti , t^) 

y k) = 0i*(*i, ti). 

Moreover, from (8), 

00 CO 

/f e a,rl *'" 1 dF(zi,x *) = // e iX,r '°' M dKfa, Xt). 


The Iasi relation, together with the equations (11) imply that F(x) and K(x) 
coincide in each quadrant, thus F(x i, x 2 ) s K(x\ , x 2 ) for all x t t x 2 . 

The final result is that z [, z 2 have the same distribution as Xi , x 2 , i.e., s x e ai yi 
and s 2 e a *y 2 have the same joint distribution as X\ and x 2 . 

The next result bears the same relation to Theorem lib that Theorem Ic 
bears to Theorem lb, that is, only positive scale changes are to be permitted. 


Theorem lie: Let x x > • • •, x k have distribution F(x i , • ■ • , x k ) such that the 

zeros of J e tZt r lo *M f ... f x k ) are nowhere densei, and let y x , • • *, y k have 

distribution G(y x , • • • , y k ). Let (xf , - * • , x k ) and (y? , • • • , y k ), a = 1, 2, 
■ • • , n, be samples with n > 3. Express x x , • • • , x k and y* , • • • , y k in spheri¬ 
cal coordinates 


x\ = r, cos 0 ] , 
j* = U sin d\ cos 0<, 


y\ = ft* cos (Pi , 

= ft* sin (p\ cos > 


Xi = r,- sin 0< • • • sin 0? *; 


?/? = Hi sin (p] • • • sin 1 


Then {0?J, i = 1, • • • , k, P = 1, • • • , n — 1, have the same joint distribution 
as |^><} if and only if there exist constants &» >(),?'= 1, • • • , fc, swc/i that the 
set kiyi have the same joint distribution as the set Xi . 

Proof: If {0?} have the same distribution as {<#} then it follows that < — 

U? 


have the same distribution as 



hence by Theorem lib there exist constants 


Ci such that {<?#<} have the same distribution as \xi}. Set Zi = | c< | ; we 

wish to show that {zi} have the same distribution as \xi). By equation (8) 
in Theorem lib it is known that {| £,-1} have the same distribution as {| Xi |}, 
moreover, if we express Z* in spherical coordinates, the angular coordinates are 



264 


GEORGE W. BROWN 


411 




the same as those of y* , therefore | ^ ave t, ^ ie 8ame distribution as 

since these functions are obtainable in terms of the angular coordinates. 

As before, we shall continue the proof from here under the assumption that 
k — 2. The procedure is a generalization of the procedure in the proof of 

Theorem Ic. sgn xj = sgn < ;—J-, >, and similarly for y, therefore 

II x * IJ 

, »«#'-'«« i«?i> sgn x\dF(x }, xl) dF(x \, xl) 

« J je& Mlo ‘ li;, ~ ,ollx;l) sgn x\ dH(x\,xl) dH(x\, xl), i = 1, 2, 
where it is assumed that Zi , z 2 have distribution H(z t , z a ). As before, set 


( 12 ) 


//• 


<p(h, ti) = f 


tZ/r log k r | 


dF(xi, xs), 


k) - J e , ' 2<rl0 « llrl ggn XidFfai, x*), «' = 1, 2, 

<eu(h, k) = J e ’ I<r |lrl sgn X! sgn XtdF(x it x*), 

and denote the corresponding transforms of //(xi, x*) by 6(h , k), Qi(h , It), 
0*(<i, fs), and 0i*(h , <*). It has been remarked already that {| z< |} have the 
same distribution as {| x< |), therefore d(h , tt) = v(k , tt). Equation (12) yields 
the relation <pt(k , k)<f>{—k , —tt) = 0<(<i , <j)0( — <i , — k),i — 1, 2; the zeros of 
<p(fi, t%) are nowhere dense, so that it can be concluded that sp.(<i , < 2 ) = 0,(<i, tt), 
i = 1,2. Now, from an equation similar to (12) we obtain </>u(k , k) — 0u(<i , k)- 
As in Theorem lib, the four relations above together imply that F(x 1 , it) sss 
H(x 1 , x»), in other words, j | c< | j/<) have the same distribution as j .r, j. 

We are now in a position to combine some of the preceding theorems so as to 
obtain analogous results for scale and location parameters together. 

Theorem IIIa: Let x have distribution F(x) such that the zeros of J e' lT dF(x) 
satisfy the. condition of Theorem la, and the zeros of 

HI e “' l0,hl ~ x, ' +ih ,ot ' x *~ Xii dF(xi)dF(xt)dF(xt) 


are nowhere dense , and let y have distribution G(y). Let Xi , • • • , x n and 

x _ x 

Vi, • ■ ■ , Vnbe samples, mth n > 9. Then w a — ——" , a = ],■••, n — 2, 

Xn-l — x n 

have the same joint distribution as the corresponding set w' a = — -— if and 

Vn-i - y« 

only if there exist constants a, c, such that c(y — a) and x have the same distribution. 



COMPOSITE HYPOTHESES 


365 


Proof: Sufficiency of the condition is an immediate consequence of the fact 
that w' a is invariant under transformations of the form y' = c(y — o). Assume 
then that {to*) and [w' a \ have the same joint distribution. By elementary 

transformations it is evident that the functions -——, ———, ———, ———, 

X7 — X» Xj — Xt Xt — Xt x» — x% 

have the same joint distribution as the corresponding functions of the y’s, if 
n > 9. Since x x , ■ ■ • , x n form a sample it follows that the pairs {xi — x », 
Xt — x,), jx 4 — Xt , xt — x«), {xi — Xt , Xi — x»), have the same joint distribu¬ 
tions and are pairwise independent, and similarly for the corresponding func¬ 
tions of the y’s. Theorem lib assures the existence of constants c 4 , c», such 
that Ci(yi — yi), c*(y* — yi) have the same joint distribution as (xi — xi), 
(x t — Xi). Considering separately the marginal distributions it is seen that 
Ci{yi — yi) has the same distribution as ct(y* — y»). V\ — y» and yt — y* have 
the same distribution, therefore either c* = Ci , or a = —c x . Set u a = x a — x t , 
v a = ci(y 0 — yi), a = 1,2. We have, for the distributions of (u x , u») and 
(vi, vt), relations corresponding to (10) in Theorem lib, with the additional 
condition that s, = s*, because of the symmetry in the variables. This implies 
that either (t’i, vi) or (— v t , —vi) have the same joint distribution as (u t , ui), 
that is, there exists c such that c(y x — yi) and c(ys — yi) have the same joint 
distribution as xi — xs and xj — x«. Application of Theorem la now completes 
the proof. 

Just as before, there is an analogous situation when we consider angular 
coordinates instead of quotients. The proof is immediate; the angular coordi¬ 
nates determine the angular coordinates of {xj — Xj, x* — X 3 ), (x 4 — x», X* — x»|, 
and |x 7 — x#, x 8 — x»), arranged as a sample. Then the constants Ci, c» in 
the proof of Theorem Ilia are both positive; it follows that c x — Ct. Applica¬ 
tion of Theorem la gives 

Theorem IIIb: Let x x , ■ ■ • , x* and y x , • ■ • , y„ satisfy the hypotheses of 
Theorem Ilia. Set 

xj - x„ = r cos 0i, yi — y« = r' cos d[, 

xt — x n - r sin 0 X cos 6t , y* — y« = r’ sin d[ cos Bt, 


x n —i - x H = r sin 0 X ■ ■ ■ sin 0 „_* ; y„_i - y B = r' sin ••• sin 

Then 0 X , • • • 0»_s have the same joint distribution as 0 X , • ■ ■ , d' n -i if and only if 
there exist constants a and c > 0 such that c(y — a) has the same distribution as x. 

Theorem IVa is a generalization of Theorem la to cover arbitrary linear com¬ 
binations of some subset of the sample. 

Theorem IVa: Suppose x has distribution Fix) such that J e ,tx dF{x) does not 

vanish, and let y have distribution 0(y). Consider the functions w m — 

n—m n—m 

X a f — y a 2} 2, • • , Wl, fi ® 1, 2, • • , 



266 


GEORGE W. BROWN 


n — 7 ft, and suppose that m > n — m. Then , if {} have the same joint distri- 

n—tn 

button as {wl} and if ^2 la$ J* 1 /or some a, it follows that F(y) ss (?(y); t/ 

_ e-' 

2^ = 1 /or aZZ a there exists a constant a such that F(y — a) s (?(y). 

Proof: Denote the characteristic functions of z and j/ by ^>(0 and ^(Z) respec¬ 
tively. By expressing the fact that jto a } and {w/ a )> a = 1, 2, • • • , n — m + 1, 
have the same characteristic function we obtain the functional equation 


n—m / n—m+1 \ ti —m-f 1 n—m ✓ n—-m-f 1 \ 

n z a( ><a)= n *(o n>(-£ w<«)- 

0-1 ( 1-1 \ 0-1 / 0-1 0-1 \ o-l / 


By hypothesis ^>(<) does not vanish, therefore ^(£) has no zeros, because of the 
relation above. <p(t) and ^(0 are continuous, thus the function /(<) = 
log <p(t) — log \p{t) can be uniquely defined in a continuous manner for all I. 
The equation above becomes 

n—m+l n~m / n-m+l \ 

(13) £/(<«)+Z/l-E J« 0 <o =o. 

a»l 0—1 \ a —1 / 


The constants l a p are necessarily linearly dependent, so that, for some a, l a p 
can be expressed as a linear combination of the others; suppose then that 


ln- 




= E ej, 


•« 0 . 


Putting these values in (13) we have 

n -tw-f l n—m / n—m \ 

(14) E /(4r) + E/ -E lap(t a + (, 1 -m+lC,,) ) = 0. 

It can be assumed that 2 e a ^ 0 , for, if e 0 = 0 for all a, we have l n - „,+i ,0 = 0, 
0 = 1 , • • ■ , n - m, that is, w'„- m+i = y„.- m + 1 and w n - m+ , = ,. m . H , hence x 

and y have the same distribution. Assuming ci ^ 0 , set t„ = —e a („.. mh ,, 
« = 2, • • • , n — m, in (14), obtaining 

n—m n—m 

(15) /(Zi) + 2 /(*~Mn-m+l) +/(Zn-m4l) + S/(“~^(^l + = 0, 

«-2 0-1 

n—m 

now, recalling that/(()) = 0, set 4,-m+i = 0, getting /(/.,) + E/(-Mi)- 

0 -i 

Evaluating this with argument 4 + Ci4. m+ i, and substituting back in (15) it 
appears that 

n —m 

(16) /(fl) +/(<B_ m +l)+ E /(~Col»-m+l) — /(<1 + fil<n-m+l). 

o-S 

Now setting 4 = 0 in (16) we have the relation 

n—m 

/(Zn—m+l) ~f" /( e a tn— m-fl) == ,f(cj ^ n —m-fl)< 



COMPOSITE HYPOTHESES 


207 


thus we have finally f(t j) + /(ei< B -m+i) = f(h + or. since Ci 9* 0, 

f(ti + ti ) = f(ti) + /(<*). The last relation implies that/(<) = ct, since f(t) is con- 

(n-m-fl n—m+1 n—m 

tinuous. Now replace/(<) byc<in (13), getting c< J - 2 2 !«*<«? “ 


^ «"»1 flf —1 />»1 j 

n—m 

0, that is, either c = 0, or 23 = 1 for all a. We conclude then that <p(t) — 

^(0, unless 2 / tt is = 1 for all a. If 2 t«/i = 1 for all a we have ^(0 = e c ty(t). 
_? _ fi 

<p(—t) = ^>(0 and ^(~0 = ^(0> hence c is of the form c = ia, where a is real, 
in other words <p(t) = <? ia ty(0, thus concluding the proof of the theorem. 

It was assumed in Theorem IVa that <p(t) has no zeros. If <p(t) has zeros 
wc have proved that, for an interval 1 1 1 < €, (p(t) = \p(t) (or <p(t) = e*°ty(0)- 
This does not necessarily imply the result of Theorem IVa, but it does imply 
at least that if the A;th moments of x and of y (or of y — a) both exist they 
are equal. 

The last result in this series can be proved by methods similar to those used 
in Theorem IVa. 


Theorem IVb: Let x and y satisfy the hypotheses of Theorem IVa. Suppose , 
moreover , that m > 2(n — m), that the rank of || l a(J || is n — m, and that 

n—m 

V! i at j 1 for at least 2 m — n values of a. Then, if there exist constants {c«} 

such that the set [c a w ' a } ham the same joint distribution as {to„}, it follows that, 
for some a, c a y has the same distribution as x. 


3. Application to Composite Hypotheses. The results of section 2 have a 
significant application in the theory of testing composite hypotheses. Suppose 
that x has a distribution of the form F(x, 0i , Of), and that the hypothesis 
0 2 = 02 is to be tested, without reference to the value of 0i. We assume that 
the parameters are independent, i.e., F(x, 0., 0 2 ) ws F{x, d[, Of) implies that 
0 t = d[ and 02 = 02 • It is true in a wide class of important cases that, given 
a sample x-i, ■ ■ ■ ,x n from the distribution F(x, 6i, Of), there exist functions 
y a {x\, • • • , x n ), a — 1, 2, • • • , p, such that j y„\ have joint distribution inde¬ 
pendent of 0i, but depending on 02Now if the \y a \ are such that their joint 
distribution redetermines the original distribution, except for 0i, one can reason¬ 
ably use the p-dimensional distribution of the | y a ) for testing the hypothesis 
02 = 02, thus reducing the composite hypothesis to a simple hypothesis. In 
testing this simple hypothesis, every alternative hypothesis (corresponding to a 
value of 02) determines a distribution of x among the alternatives F(x, 0i, Of) 
except for the unknown 0j, that is, there is a one-to-one correspondence between 
the two sets of alternative hypotheses, expressed by the fact that if 02 = 0* 
then the distributions of the set \y a \ corresponding to 0 S = 02 and 02 = 02 
must be different. 

Suppose, for example, that it is desired to test whether y = x — a for some a 
has the distribution F(y, 0°), with the assumption that, for some o, y has the 



268 


GE0HGK W. BROWN 


distribution F(y, 0). Given a sample one can form the set w a = x„ — x n , 
a=l, 2, •••,« — 1, obtaining the distribution G(wi , • • • , w„-i, 0); now con¬ 
sider the simple hypothesis 0 - 0°, knowing that G determines 0, by Theorem la. 
Similarly one can test whether cx, for some c ^ 0, has distribution F(y, 0°), 
by forming io„ = xjx n , a — 1, •• • ,n — 1, or by expressing (xi , • • • , x„) 
in spherical coordinates and considering the angular coordinates, according to 
whether both positive and negative or only positive values of c are to be allowed. 

In the same way one can test the hypothesis 0 = 0 0 under the assumption 

X _ x 

that c(x — a) has distribution F(y, 6) by forming w a = —-, a = 1, • • • , 

x„_i — x„ 

n — 2, or by expressing (xi — x» , • • • , x„_i — z„) in spherical coordinates and 
considering the angular coordinates. 

Theorem IVa may be applied to analogous problems, in which the hypot hesis 
0 = 0 # is to be tested under the assumption that y = u — Za,x, has distribution 
F(y, 0) for fixed values of the z,, with the a, unknown. In such problems 
there exist linear combinations of the observed values of y which are independent 
of the a { . By Theorem IVa, under certain conditions the joint distribution of 
these linear combinations determines the original distribution of y, without 
regard to the a,. 

In applying some of the preceding results wc must verify in cert ain cases that 
the zeros of J e' tx dF(x) are nowhere dense, for a certain distribution function. 


By a change of variable the condition of Theorem lb can be stated in this form; 
moreover if F(x) satisfies this condition it is evident that it satisfies the condi¬ 
tion of Theorem la. A sufficient condition applicable to a considerable class 


of cases has been obtained by Levinson |4]; if f(x) is 0(e ) as x —* », where 

0(x) is monotone and dx diverges to », then j r' lx f(x) dx cannot vanish 

on an interval without vanishing identically. It is evident that it is likewise 
sufficient if the corresponding condition holds as x — * — « inst ead of + oc. In 
particular, if there exists A such that fix) = 0 for x > A (or for x < A) it is a 

consequence of the Levinson result that J F tz f{x) dx has no intervals of zeros. 


It can be established easily that if f(x) is majorized by | x | (l ~' ) , e > 0, in the 
neighborhood of the origin, then f <> ,f los 1x1 fix) dx has no intervals of zeros. 


As a simple example consider the rectangular distribution on (0, 1). Let 
(x — a)/r have this distribution with a unknown, r > 0, and suppose that we 
are interested only in r. Given a sample Xi, • • • , x„ form the functions y a — 
(z« - x n )/r, a = 1, •••,»— 1. Set y* = max ( y a , 0), y L - min (y a , 0). 
Then it can be shown that y x , • • ■ , y „-1 have probability density (1 — y»i + yi) 
in the region —1 < y a < 1, j !m - Vl < 1, zero elsewhere. ^ = y M — Vl is 
of course the quotient of the sample range by r. It can be shown that $ has 



COMPOBITE HYPOTHESES 


269 


density n(n — 1)(1 — i df. Theorem la makes it possible to base any 
tests not involving a on the distribution of the y., since if the y« have the 
stated distribution then (x — o)/r for some a must have the rectangular dis¬ 
tribution. 

Similarly, suppose y — (x — a)/r has the distribution e~ y , y > 0, for some 

X X 

a, r. Then w a = “-", a = 1, 2, ••• n — 1, have distribution density 

r 

1 whore w l4 = min (0, w a ). Again, the latter distribution may be 

n 

used to estimate r. 

Let. us examine the distributions of functions of the type considered, in the 
case of normality. Assume that x x , • • • , x n are a sample of n observations 
from a normal distribution with unit variance and unknown mean. The 
variables y a = x<* — xi, a = 2 , • • • , n, have a joint normal distribution with 
zero means and matrix of variances and covariances || A' j \\ = || 1 + 5<y||. 
Then Theorem la shows that if {y a I have this joint distribution then x is nor¬ 
mally distributed with unit variance. Note that x*-i 35 2 ^aViVi m 2(x« X Y- 
If we had x = x'/<r , then 2(xl — x') 2 = ~i, giving the estimate 

— - 2(x' a - .f') 2 for a\ 
n — I 


There are, of course, many ways in which the matrix || An || may be trans¬ 
formed into a diagonal matrix in order to obtain a new set of independently 
distributed variates; one convenient set is the set y/\ y 2 , y/\ ( y 8 — $yt), • • • , 


\/ U ^ ( y n -- 2 V *Y In terms of the original x’s we have \/\ (xt — Xi) 

y n \ n-K-! / ^ 

\/f — J(xi + x 2 )), A/*-- y Xfi — —: 23 x a ); these functions of the 

y n \ n — 1 *-i / 


data are independently distributed according to the normal distribution with 
zero mean and unit variance. 

Similarly, in the case of a sample X \, • • • , x n from a normal distribution with 
zero mean and unknown variance, there exists a set of n — 1 functions with 
distributions independent of the variance. A convenient set of functions is 
the set 


. _ Vma Wi. 

— 


\/±x{ 

f <-i 


Tit — 


It is known (see Bartlett [1]) that the variables t m are independently distributed 
according to student t-distributions with m degrees of freedom respectively. 
The set t m determines the set of angular coordinates obtained by expressing 
* 1 , • • • , x n in spherical coordinates, hence we can conclude, conversely, that if 
|< m ) have this joint distribution then x is normal with mean zero. 






270 


George w. brown 


Finally we can eliminate both mean and variance. Suppose X \, • • • , x n are a 
sample from some normal distribution. The variables 


w» = 



m = 1,2, • • •, n — 1, 


are normal and independent with mean zero and some variance. Then we have 
the set 



= 1. 


n 


2 , 


independently distributed according to /-distributions with r degrees of freedom 
respectively. It may be convenient for computational purpose's to make use of 
the identity 


T -L 

&j + 1 



We then have 


r+l / , r+1 \ 2 r+i 

El*/" ", , E *<) ■ E (*i “ 

y«i \ r + 1 <«i / y-i 




t n 


V 

r + 2 


2) ^ r+2 ^< r +i)) 

/r+l ’ 

y E (x,- - X (r+ l)) 2 


r = 1, • • • , 11 — 2. 


Now, by Theorem IIIc, we know that if the set { t' r j has this specified distribution 
then x must be distributed according to some normal distribution. The set 
{<£} may be used to test the goodness of fit of the observations io normality, 
by first adjusting the set j/ r | to a standard basis of comparison, i.n., by con¬ 
sidering F r (C), where F r is the corresponding cumulative distribution function 
and then applying, for example, a x* goodness of fit test to these n — 2 quanti- 
tities, with respect to the rectangular distribution on (0, 1). 


REFERENCES 

[1] M. S. Bartlett, Ptoc. Camb. Phil Soc. Vol. 30 (1934), pp. 164 169. 

[2] J. Neyman and E. S. Pearson, Biometrika, Vol. XXA, (1928) pp. 175-240, 264 294. 

[3] R. A. Fisher, Proc. Roy. Soc. A, Vol. 144 (1934), pp. 285-307. 

[4] N. Levinson, Proc. London Math. Soc. Vol. 41 (1936), pp. 393-407. 

[5] E. J. G. Pitman, Biometrika Vol. 30 (1939), pp. 391-421. 

Princeton Univebsity, 

Princeton, N. J. 





THE SELECTION OF VARIATES FOR USE IN PREDICTION WITH 
SOME COMMENTS ON THE GENERAL PROBLEM OF 
NUISANCE PARAMETERS 

By Harold Hotelling 

1. Maximum Correlation as a Test For predicting or estimating a particular 
variate y there is frequently available an embarrassingly large number of other 
variates having some correlation with y . For example, in fitting demand 
functions by means of economic time series, the number of series of observations 
having some relation to the demand which is sought to be estimated is apt to be 
very large, whereas the number of good independent observations on each is 
quite small. The proper coefficients in the regression equation must ordinarily 
be determined from the observations, and must not exceed in number the ob¬ 
servations on each variate. Furthermore, in order to have a measure of error 
that will make it possible to distinguish real effects from those due to chance, 
it is necessary that the number of predictors 1 shall be enough less than the 
number of observations on each variate so that the residual chance variance 
can be determined with an appropriate degree of accuracy. It is desirable to 
select a set of predictors yielding estimates of maximum but determinable ac¬ 
curacy, and at the same time to avoid the fallacies of selection among numerous 
results of that one which appears most significant and treating it as if it were 
the only one examined. 

('Onsiderations other than maximum and determinate accuracy are of prac¬ 
tical importance. The labor of calculation by the method of least squares 
becomes a serious obstacle to the use of the theoretically optimum set of vari¬ 
ates when these are very numerous, though the rapid current development of 
mechanical and electrical devices suitable for these computations offers a hope 
that the limits now set in practice in this way will soon be considerably increased. 
Furthermore, predictions or estimates must, as in speculative business or in 
military activity, be made from moment to moment, often in a rough manner 
by persons incapable of or averse to using complex formulae, and in such activi¬ 
ties frequent revisions of the regression equations must be made to accord with 
altered conditions. Also, in temporal predictions, the time of availability of 

1 1 use this term for what are often called the independent variates in a regression 
equation, since these ordinarily are not really independent in the probability sense. Simi¬ 
larly I shall call the “dependent” variate the predictand. By prediction I mean merely the 
use of regression equations to estimate some unknown variate by means of the values of 
related variates, without any necessary connotation of temporal order, though the most 
interesting applications seem for tho most part to be those in which we pass from a knowl¬ 
edge of the past to an estimate of the future. 

271 



272 


HAROLD HOTELLING 


the values of the predictors is important, since an early prediction (e.g. of the 
size of a harvest) is more valuable than a later one of the same accuracy. 

If we make the usual assumption 2 that the probability distribution of y is, 
for every set of values of the predictors, normal with a fixed variance a 2 and an 
expectation that is a linear function of the predictors, we shall wish to minimize 
<r 2 subject to appropriate limitations, and this amounts to the same thing as 
maximizing the multiple correlation p of y with the predictors, since I — p 2 is 
the ratio of <r 2 to the total variance of y , which is the same for all sets of predictors. 
The estimates s and R of a and p obtained from the available sample are of 
course a different matter. But it is clear that the value of R provides a suitable 
criterion of choice under the following conditions: We are called upon to choose 
one among two or more sets, each consisting of a fixed number of predictors; 
for each predictor we have a known value corresponding to each of the values 
yx , • • • , y N observed for the predictand; and there is no basis for preferring one 
of these sets to another either in theory, in observations extraneous to those just 
specified, or in cost or time of availability. In particular, if just one predictor is 
to be used, that having the highest sample correlation with the predictand should 
under these conditions be the one adopted. But in making such a choice a test 
of its accuracy is required, to take account of the possibility that the wrong 
choice has been made because of chance fluctuations in the sample correlation 
coefficients. 

There are innumerable economic variates available for prediction of 
business conditions, and most of these are highly correlated with each other. 
The selection of one business index instead of another for a particu¬ 
lar purpose will involve the question which has exhibited the higher correlation 
with the quantity to be predicted, and consequently the question of the definite¬ 
ness with which the difference between the calculated correlations can be 
regarded as significant. 

Our problem evidently has a bearing on governmental policy in selecting 
among the numerous series of data those whose continuation will be •most valu¬ 
able. The high cost of assembling these statistics dictates a careful selection of 
a limited number of series having little correlation with each others* current 
values, but with correlations as great as possible with those things whose predic¬ 
tion or estimation is most important. 

2. The Choice of one Predictor with Two Available. Let us take first the 
simplest case, which may be illustrated by a Michigan State College problem of 

1 We shall not here go into the question of the applicability of these standard assump¬ 
tions to time series otherwise than to note that some transformations of observations 
ordered in time are usually necessary and sufficient to obtain quantities satisfying the 
assumptions so closely that deviations from them cannot be detected. Such transforma¬ 
tions include replacing a variate by its logarithm, and eliminating trend and seasonal 
variations by least squares. ' In view of the satisfactory adjusted observations found 
empirically by these and similar methods, the usual objections to studying time series by 
exact methods seem much exaggerated. 



SELECTION OF VA&IATEb 


273 


which Dr. W. D. Baten has told me. The ultimate weight of a mature ox is 
estimated by means of his length at an early age. The question has been fused, 
however, whether a more accurate prediction might not be made by means of 
the calf’s girth at his heart. Records were at hand of 13 oxen showing their 
lengths and girths as calves and also their weights when mature. A regression 
equation involving both length and girth would presumably give greater accuracy 
than either variate alone; but it appears that those who make the estimates 
desire a simple formula involving only one variate. Suppose, then, that in such 
a sample the correlation of weight with length is fi « .7, that the correlation 
of weight with girth is r* = .5, and that the correlation of girth with length is 
r 0 = .4. Is the difference r x — r* = .2 sufficiently great in relation to its sampling 
errors to warrant the inference that girth is really a better predictor than 
length, or must the question be left in abeyance until more observations can be 
accumulated? 

A straightforward procedure which would have been used with little question 
before the advent of modern exact methods is to calculate the asymptotic ap¬ 
proximation to the standard error of r x — r 2 by the differential method, assuming 
the three variates to have the trivariate normal distribution, and to regard the 
difference of the correlations as significant if it exceeds a multiple of this standard 
error determined by the tables of the normal distribution. The calculation of 
the asymptotic approximation <r ri -r t may be carried out in the following manner. 
Let pi , p 2 , and po be the population values of ri, r 2 , and r© respectively. Then 
if (Tij denote the population covariance of Xi and x,-(t, j = 0,1, 2), we have 

Pi = 

with similar formulae for pj and po. Likewise the sample estimates of these 
parameters are given by such expressions as 

$01 

r x = 


Taking the logarithm of this last expression, expanding about the population 
values, denoting by the operator 5 the deviation of sample from population values 
of the covariances, and the resultant deviation in r x , and dropping terms of 
order higher than the first, we have: 


In the same way 


* _ («*oi 

ori = pi I — 
\<roi 


. f $*01 $*00 

- & 


— ?*5 — $gn\ 

2<roo 2<rn/ 

- 

2 <r M /- 


The asymptotic value of the sampling covariance is obtained by multiplying 
these two expressions together and taking the expectation. The sampling co- 
variance of two estimates of covariance of the usual kind (sum of products 



274 


HAROLD HOTELLING 


divided by number of degrees of freedom) in the same sample, having n degrees 
of freedom (which ordinarily means that there are n + 1 individuals in the 
sample and that the means are eliminated), is given exactly by the formula 3 

E(6Sij88km ) = (VikVjm + <r, m (r^)/n, 

in which the subscripts may have any values, equal or unequal. When this 
formula is applied to each of the nine terms of the product and the results are 
expressed in terms of the correlations p*, there results the asymptotic expression 
for the covariance given by 

nE(hr\br<i) = ^pip^p? + P 2 + Po — 1) + Po(l — Pi ~ pi)* 


This method provides also one of the derivations of the familiar formula which 
may be written 

m 2 ri = nE(5n) 2 = (1 - pi) 2 , nal, = (1 - p 2 2 ) 2 . 

The variance of the difference of n and r 2 is the sum of their variances minus 
twice their covariance. Hence 


n<7 2 | _r 2 = (1 — pi ) 2 + (1 — ply — pWpi + P 2 + Po — 1) + 2po(pi + p\ — 1). 


We are testing the hypothesis that pi = P 2 . If we put a common value p 
for them in the last expression and simplify, we obtain for the standard error 
of the difference, 


0’r l ~r 2 



— po)(2 - 3p 2 + pop 2 ) 

n 


The second factor in parentheses is always positive because of the inequalities 
limiting the correlations among three variates. 

This formula contains two unknown parameters, p and po. The classical 
procedure would be substitute n , r 2 and r 0 respectively for pi, p 2 , and po in the 
previous formula, and use the resulting standard error expression as if the ratio 
to it of n — r 2 were normally distributed. A first modification, more in line with 
modern ideas, would be to use some kind of average of n and r 2 as an estimate 
of both pi and p 2 , since the null hypothesis tested is that these are equal. But 
whatever sample estimates we substitute for p and po, the formula remains un¬ 
satisfactory, since no suitable limits of error arc available. If instead of the 
standard error we were to work out the exact distribution of n — r 2 we should 
still not be free from the difficulty. This exact distribution clearly involves 
both p and po, since its variance does so. Neither can we escape from the 
trouble by using some function z = /(r), such as the inverse hyperbolic tangent 
suggested by R. A. Fisher, and considering the standard error of Z\ — z 2 = 


3 1 have given a derivation of this formula from the characteristic function of the multi¬ 
variate normal distribution [1]. Numerous special cases appear in earlier literature. The 
derivation above is a simplification and improvement of several versions, appearing in 
the various early writings of Karl Pearson. 



SELECTION OP VARIATES 


275 


f(r i) — /(r*); for this standard error will have as the first term in its expansion 
in a series of powers of rT 1 simply the product of the expression above for 
ovj-r, by /'(p); and this must clearly involve both po and p. 

3. Nuisance Parameters. This is not by any means the only statistical prob¬ 
lem in which unknown and undesired parameters enter into the distribution of 
the statistic which we should naturally use to test a hypothesis. Indeed, the 
early investigation which was perhaps most influential in setting the whole tone 
of modern statistical research was that [2] in which W. C. Gosset (“Student”) 
arrived at the exact distribution of the ratio of a deviation in the mean to the 
estimated standard error. The previous practice (which unfortunately survives 
today in some quarters, and is even taught to students without explaining its 
approximate character) was to neglect the sampling errors in the estimate of 
the unknown variance a 2 and to treat the ratio as normally distributed with 
unit variance. The rigorous derivation by Fisher [3] of the Student distribution 
makes clear the manner in which the nuisance parameter & may in this, and in 
some other, problems be eradicated from the distribution through integration, 
after altering the original statistic (the deviation in the mean) by dividing it 
by another statistic. The new statistic, the Student ratio, vanishes whenever 
the old statistic, the deviation in the mean, does so, and the same hypothesis 
is tested by both. This then is one way to get rid of a nuisance parameter: 
when you have a statistic estimating a parameter whose vanishing is in question, 
but whose distribution involves another parameter, alter the statistic by multi¬ 
plying or dividing by another statistic in such a way that the new function 
vanishes whenever the old one does so; and do this in such a way that the new 
distribution will be independent of the nuisance parameter. Unhappily, this 
method has been applied successfully onty in particular cases, and no way to 
use it in the problem at hand has been found. 

A second method is that of transformation employed by Fisher in dealing with 
such problems as testing the significance of the difference between the correla¬ 
tion coefficients in independent samples between the same two variates. The 
need for the transformation in this case is occasioned by the presence in the 
distribution of the difference of the sample correlations of the unknown true 
value, which is not directly relevant to the comparison. We have seen that 
this method also fails to solve our problem. 

A third method of dealing with nuisance parameters is the use of fiducial 
probability by R. A. Fisher [4] and by Daisy M. Starkey [5] in testing the 
significance of the difference between the means of two samples when the 
variances may be unequal. Criticisms of these applications of fiducial probability 
have been made by M. S. Bartlett [6] and B. L. Welch [7], and the field of 
applicability of such methods is still in need of elucidation. 

Some findings of J. Neyman [8] having a bearing on the general nuisance 
parameter problem should also be noted. 

The only other class of methods for dealing with nuisance parameters of which 



276 


HAROLD HOTELLING 


I am aware involves the comparison of the particular sample obtained, not with 
the whole population of samples with which a comparison might be made if we 
knew the value of the troublesome parameter, but with a sub-population selected 
with reference to the sample in such a way that the distribution, in this sub¬ 
population, of the statistic used does not involve any unknown parameter. An 
example is the testing of significance of a regression coefficient. Thus if we 
suppose that a sample of values of x and y is drawn from a bivariate normal 
population, and calculate the regression coefficient b of y on x in the sample, 
the distribution of b involves not only the population value /3, but also the ratio 
a of the variances in the population. Since this second parameter is unknown, 
and can only be estimated from the sample, it is not possible to use the distribu¬ 
tion of b in the whole population directly to test the significance of b — /3. 
What we do is to find the place of this difference, not in the whole population 
of values in which both x and y are drawn at random, but in a sub-population 
for which the values of x are the same as in our sample. We may alternatively 
say that we limit the sub-population only to that for which the sum of the 
squares of the deviations of the values of x from their mean is the same as in 
our sample; the results are the same. The distribution in this sub-population 
of the ratio of b — /9 to its estimated standard error is of the Student form, with 
no unknown parameters, and on this basis it is possible to make exact and 
satisfactory tests and to set up fiducial limits for b. Another example is that 
of contingency tables. The practice now accepted (after a controversy) for 
testing independence of two modes of classification, such as classification 
of persons according as they have or have not been vaccinated, and again ac¬ 
cording as they live through an epidemic or die, is to compare the observed 
contingency table, not with all possible contingency tables of the same numbers 
of rows and columns, but only with the possible contingency tables having 
exactly the same marginal totals as the observed table. 

4. An Exact Solution. We shall solve the problem of the significance of the 
difference of n and r t with the understanding that the meaning of significance 
is to be interpreted by reference to the sub-population of possible samples for 
which the predictors T\ and xj have the same set of values as those observed in 
the particular sample available. This procedure, besides yielding an exact 
distribution without unknown parameters, has the advantage of relaxing the 
stringency of the requirement of a trivariate normal distribution. We now make 
only the assumptions customary in the method of least squares, that the pre- 
dictand y has the univariate normal distribution for each set of values of Xi and 
x*, independently for the different sets, with a common variance o , and with 
the expectation of y for a fixed pair of values of the predictors a linear function 
of these predictors. No assumption is involved regarding the distribution of 
the predictors, since we regard them as fixed in all the samples with which we 
compare our particular sample. The advantages of exactness and of freedom 



SELECTION OF VABIATES 


277 


from the somewhat special trivariate normal assumption are attained at the 
expense of sacrificing the precise applicability of the results to other sets of 
values of the predictors. 

Since the correlational properties are unchanged by additive and multiplica* 
tive constants, we may suppose that 

(1) Sxi = 0 = Sx ,, Sx\ = 1 - Sx\ , 

where S stands for summation over a sample of N individuals. The notation 
may be made more explicit by the adjunction of an additional subscript a, vary¬ 
ing from 1 to N, to denote the individual member of the sample, so that instead 
of Sx i, for example, we might write Sx la . The omission of this additional 
subscript is convenient and will usually leave no ambiguity when we deal with 
sums, but it will be convenient to retain it in connection with individual values. 
The correlation n of Xi with x t in all those samples we shall consider is, by (1) 

r 0 = SxiXs . 

Now consider the new quantities 

_ ^la a "f" %ta 

U “ _ V2(T- ro)’ * “ V2(l +ro)' 

Evidently, from (1) and (2), 

(3) Sx' = 0 = Sx", Sx* = 1 = Sx'*, Sx'x" = 0. 

Since the mean value E{y a ) is a linear function of x ia and may, upon 

subtracting a constant from all these expectations, be written 

(4) y a = PlXia + PiXla + A„ , 

where Ai , • • • , A w are normally and independently distributed with variances 
all equal to a and expectations zero. The assumption that Xt and x t are equally 
correlated with y in the population leads to the conclusion that ft = fit and 
putting 0 = ft\/2(l + ro), we then have from (4) and (2): 

(5) Va = ftr" + A. . 

C’onsequently, by (3) 

Sx'y = Sx a y a = fiSx'x" + Sx'A = Sx'A; 

and this function has a normal distribution with zero mean and variance a 2 . 
If in the sample we work out a regression equation 

Y = a + b'x' + b"x", 

the normal equations for determining b' and b" must by (3) take the Bimple forms 
a = y, b' — Sx'y, b" = Sx"y. 



278 


HAROLD HOTELLING 


From the general theory of least squares it is known that the sum of squares 
of residuals is 

Si = S(y - Yf = Sy* - ySy - ( Sx'yf - S(x"yf, 

and that Si/i has the x distribution with n = N — 3 degrees of freedom, 
independently both of Sx'y and of Sx"y. From these facts it follows that 

(6) t = Sx'y ~ 

has the Student distribution with n degrees of freedom. Since in accordance 
with the foregoing definitions and (1) we have 


and since also it is known that 


where 


(6) may be written 

(7) 


Si = S(y - yf 7~3, 

i — To 


D = 


1 n r 2 

r, 1 To 

Ti To 1 


t = in- r 2 ),j/ 


1 + To) 

2D 


The probability of a greater value of | t 1 is given by tables of the Student 
distribution with n = N — 3. If this probability is sufficiently small (which 
conventionally means less than .05, or sometimes .01) wo have a corresponding 
degree of confidence that the variate chosen because of a higher correlation in 
the sample has actually a higher correlation than the other in the population. 


6. The Selection of One Variate from Among Three or More. Suppose that 
we are to choose one of the variates x\, • • • , x p in order to predict y. (p < N — 1) 
We choose the one having highest correlation, and wonder how much confidence 
to place in this choice. We shall now determine the distribution of a function 
suitable for testing the hypothesis that there is no real difference between any 
pair of the correlations of Xi, • • • , x v with y. Again we shall assume the values 
of these predictors fixed, and look for the place of our particular sample among 
all samples having these values, with only y free to vary normally by chance. 

Let an = S(xi — Xi){Xj — Xj), and let c,, be the cofactor of a,-, in the deter¬ 
minant a of these quantities, divided by a. Then 


1 if j = k, 
.0 if j k. 


( 8 ) 


X OtiCtt — Sjk s 



SELECTION OF VABIATES 


279 


Here 2 stands for summation from 1 to p. Let 


(9) 

( 10 ) 
(ID 


Z Cij 

lie,'/ 

li ^ S(xi x)y> 
l = 2iM». 


From (9) it follows that 

( 12 ) = 1 . 

From the hypothesis that y is in the population equally correlated with all the 
Xi it follows that h , • • • , l p have equal expectations, which we may denote by 
X; and from (11) and (12) it follows that also E(l) = X. Obviously 

(13) E(U - X)(l, - X) = a 2 aij , 


where a 1 is the variance of those values of y corresponding to a fixed set of 
values of the z’s. From (11), (13) and (9) we obtain 


(14) 




Since the Z, are linear functions of the y’s, they have the multivariate normal 
distribution. From the theory of this distribution and the values (13) of the 
covariances it follows that the distribution has the form 

(2*r* VVV™* 1 dh- - dl p , 

where a is the determinant of the a*/s, and 

T = ZZciAU - \)(h - X). 

We may introduce linear functions l [, • • • , Zj, of l x — X, • • • , l v — X such that 

T = l[- + ... + i' p \ and such that C - (I - X) s 22c <; . Now r L X ..l ^ . ZJp z l 

has the x 2 distribution with p — 1 degrees of freedom. The numerator of this 
expression equals 

r — fp 2 = 22c,-, - \)(lj - X) - (l - X) J 22c 0 

= 22 Cijltlj — i 2 22 Cij 


= 22c ,-,(k - l)(lj - l). 


The penultimate form shows that this function is independent of X; the last, 
as a positive definite form in the deviations of the V s from their weighted mean, 
shows that sufficiently large values of the expression will reveal with definiteness 
the inequality of the predicting powers of the p variates when this exists. 



280 


HAROLD HOTELLING 


It is well known that the regression coefficients of y upon the set of variates 
* 1 , • • • , x p are completely independent of the sum of squares Sv 2 of residuals 
from the regression equation. Since the l ’s are linear functions of these regres¬ 
sion coefficients, (namely the linear functions appearing in the normal equa¬ 
tions), they also are independent of Sv*. Hence, if we put 

s ZZdjlilj - Z*22 di 
®i ~ - , 

p - 1 



the ratio F = «?/«? will, in case of equality of the correlations of the various 
x’a with y, have the variance ratio distribution with n, = p — 1 and n 2 = N — 
p — 1 degrees of freedom. When p = 2 this test reduces exactly to (7), as it 
should, and F = t 2 . 

In the numerical application of this method, the regression coefficients &,• 
of y on xi, • ■ ■ , x p should first be worked out by the inverse matrix method. 
The right-hand members of the normal equations are U , ■ • • , l p , the coefficients 
in these equations are the a,-,, and the calculation of s? is simplified with the 
help of the identity 

22Ci,Uy = 2&<Z<. 


6. Selection of Additional Variates When Some Have Been Chosen. Sup¬ 
pose now that q predictors have been included definitely in the regression equa¬ 
tion, and that one more is to be selected for inclusion among p additional pre¬ 
dictors that are available. The criterion now is that that one should be chosen 
tentatively which has the highest partial correlation with the predictand, elimi¬ 
nating those already definitely chosen; but. the confidence to be placed in the 
choice is to be judged by an adaptation of the criterion of the preceding section. 
It is only necessary to consider the Oj,, Z, , c<, and f>, (i, j = 1, • • • , p) as cal¬ 
culated from the new predictors and the deviations of y from the regression 
equation on the predictors already adopted. Formulae may easily be derived 
for the values of these quantities in terms of those already found and the sums 
of products, so as to simplify the calculations. Sv 2 will now stand for the sum 
of squares of residuals from the regression equation involving all the p + q 
predictors. It is to be divided by JV — p — q — lto obtain «*. The numbers 
of degrees of freedom with respect to which F is to be judged are now «i = p — 1 
and Tit — N — p — q — 1. When p = 2 this test, like that of the preceding 
section, reduces to the use of the ^-distribution of (7), with n — N — q — 3, 
and the correlations standing for partial correlations eliminating the predictors 
already definitely chosen. 

A special instance in which this procedure is applicable is in economic time 
series, in which time, in the form of orthogonal polynomials, must ordinarily be 
“partialled out” in order that tests of significance may be sound. 



SELECTION OF VABIATES 


281 


7. Further Problems. It is natural to ask whether the foregoing work can be 
extended to examine the soundness of the selection, on the basis of a greater 
multiple correlation, of a particular set of two or more variates, chosen from 
among several such sets. The simplest such problem that goes beyond what 
has been done above deals with two sets, each of two predictors, having in a 
sample multiple correlations R and R ' with the predictand. The question is 
whether the difference R — R f is significant. 

Suppose that, in the interests of simplicity and the hope of attaining a solu¬ 
tion satisfactorily free from unknown parameters, we assume as before that the 
predictors have a fixed set of values, the same in all samples. Since multiple 
correlations arc invariant under linear transformations of predictors, we may 
without loss of generality assume that the predictors in each set are mutually 
uncorrelated and have sums of squares equal to unity. Indeed, we may go 
somewhat further in standardizing the sets of values to which consideration can 
be confined without loss of generality, with the help of some ideas introduced 
in the paper [1]. In the terminology of that paper, the variates in each set may 
be considered canonical with respect to the relationship between the sets. This 
means that linear functions X\ and x 2 of the two variates in one set, and linear 
functions x[ and x 2 of those in the other set, can be chosen so as to satisfy not 
only the conditions 

Sx i = Sx 2 = Sx[ = Sx 2 = 0 

(15) &r? = Sxl = Sx? = Sx? - 1 

Sxix 2 = 0 = Sx'iXi , 

but also the further conditions 

(16) Sxix't as 0 = Sx&l . 

This means that, for all the purposes in view, the two sets of predictors can be 
characterized as to their mutual relationships by the values of the remaining 
two sums of products, namely 

Ci = Sx\x [, c* = Sx&[ . 

In view of the conditions assumed earlier, ci and c* are what have been called 
the canonical correlations between the two sets. 

To the sets thus standardized, the predictand y is related in a manner expressed 
by the population regression coefficients pi and fa of y on the first set, and p[ 
and p 2 on the second. If we take y as having unit variance in the population, 
the squared multiple correlation coefficients in the two cases will be 

p’ = A + A , p' s = iff + iff. 

The hypothesis to be tested is that p = p'. If b\,bt,b[, 6* denote the sample 
estimates of the regression coefficients, the statistic appropriate for the test 
would appear necessarily to be proportional to 

w - l(bl + b\ - b? - b?). 



282 


HAROLD HOTELLING 


The sample regression coefficients are normally distributed, with population 
correlations equal to the sample correlations among the corresponding predictors. 
The variance of each is a 2 . Thus their joint distribution may be written down 
at once, in a rather simple form in view of (15) and (16). From this it is pos¬ 
sible to determine directly the characteristic function M(i) = Ee tw of w. If 
we write K{t) = log M{t) we obtain: 

2 K(t) = 2-{(# - 2cM + H’Y + - fi?)l ||1 - (J - c*)/ 8 )- 1 

- Slog {1 - (1 - c?)< s ). 

Here the summations are with respect to j over the values 1 and 2. If each set 
of predictors had had $ members, the same result would hold for K(t) except 
that the summations with respect to j would then extend from 1 to s. 

This is a very disappointing result because it contains so many parameters. 
The distribution of w must contain the same 8 parameters as its characteristic 
function. All the four parameters 0, , appear in the expression above, though 
their effective number is reduced to three by the condition that the two sums 
of squares shall be equal which constitutes the hypothesis under test. The 
distribution of w thus contains at least three unknown parameters besides <x. 

The estimate of variance s 2 obtained from the residuals from the grand re¬ 
gression equation of y on Xi , x 2 , x.\ , and x 2 is independent of w. Its distribu¬ 
tion is of the usual form and involves a parameter, the population variance, 
which is a function of ft, ft, , and (i 2 . We could therefore pass by a single 

integration from the distribution of w to that of the statistic w/h 1 , which vanishes 
with w, and which on this account, and on grounds of physical dimensionality, 
might be considered appropriate to test, the hypothesis that p = p . The ques¬ 
tion may be raised whether the distribution of this ratio might not Ik; free from 
parameters. The answer unfortunately is in the negative, as appears from an 
examination of the characteristic- function of the ratio. Even in the simplified 
ease in which all the c,- are equal, a troublesome parameter persists in the 
distribution. 

Thus we meet again the problem of nuisance parameters, and this time no 
escape is visible. Perhaps some such artifice as those enumerated in paragraph 
3 (for example, some further limitation of the sub-population within which we 
should seek the place of our particular sample) is capable of yielding an exact, 
or “studentized” distribution, but this has not yet been found. The problem 
is of considerable interest, not only because of its practical importance, but 
because of its suggestiveness in connection with general theory. 

Numerous other problems having both practical importance and general 
theoretical interest are associated with the selection of predictors. For example, 
we have not dealt at all with the problem of the number of predictors that 
should be used when maximum accuracy in prediction, or in evaluation of the 
regression coefficients, is the sole criterion. A particular case is the determina¬ 
tion of the degree of the regression polynomial which should be fitted to obtain 



SELECTION OF VARIATES 


283 


maximum accuracy, for example of the number of orthogonal polynomials in 
fitting a trend. Such customary criteria as minimizing the estimated variance 
of deviations, in which the sum of squares which is the numerator and the 
number of degrees of freedom which is the denominator both diminish to zero 
as the number of variates is increased, do not rest upon any satisfactory general 
theory. 

Another related set of problems is concerned with variates more numerous 
than the observations on each. It is clear that there is real information in¬ 
herent in data of this kind, but existing theory and methods, including those of 
the present paper, are not adequate to utilize it in a thoroughly efficient manner. 
A recent paper of P. L. Hsu [9] is unique in not excluding the case in which the 
variates outnumber the observations. 

8. Summary. A criterion has been obtained for judging the definiteness of 
the selection of a particular variate, from among several available for prediction, 
on the basis of its having the maximum sample correlation with the predictand. 
A variation of this criterion is applied in paragraph 6 to the problem of extending 
the list of variates to be used in a regression formula. 

Some of the problems of “nuisance parameters” which affect general theory 
are illustrated in this problem. Some outstanding unsolved problems related 
to these questions are discussed in paragraph 7. 

REFERENCES 

[1] Harold Hotelling, “Relations Between Two Sets of Variates,” Biometrika , Vol. 28 

(1936), pp. 321-377. 

[2] “Student,” “The Probable Error of a Mean,” Biometrika , Vol. 6 (1908), pp. 1-25. 

[3] R. A. Fisher, “Applications of ‘Student's’ Distribution,” A fetron, Vol. 5 (1925), pp. 

90-104. 

[4] R. A. Fisher, “The Fiducial Argument in Statistical Inference,” Annals of Eugenics , 

Vol. 6 (1935), pp. 391-398. See also Fisher’s answer to Bartlett in the Annals of 
Math. Stat., Vol. 10 (1939), pp. 383-388 and the references there given. 

[5] Daisy M. Starkey, “A Test of the. Significance of the Difference Between Means of 

Samples from Two Normal Populations Without Assuming Equal Variances,” 
Annals of Math, Stat.., Vol. 9 (1938), pp. 201-213. 

[6] M. S. Bartlett, “The Information Available in Small Samples,” Proc. Camh. Phil. 

Soc ., Vol. 32 (1936), pp. 560-566. 

[7] B. L. Welch, “On Confidence Limits and Sufficiency, with Particular Reference to 

Parameters of Location,” Annals of Math. Stat., Vol. 10 (1939), pp. 58-69. 

[8] Statistical Research Memoirs , Vol. 2 (1938), pp. 58-59. 

[9] P. L. Hsu, “On the Distribution of Roots of Certain Determinant al Equations,” Annals 

of Eugenics , Vol. 9 (1939), pp. 250-258. 


Columbia University, 
New York, N. Y. 



THE FITTING OF STRAIGHT LINES IF BOTH VARIABLES ARE 
SUBJECT TO ERROR 

By Abraham Wald 

1. Introduction. The problem of fitting straight lines if both variables x 
and y are subject to error, has been treated by many authors. If we have N > 2 
observed points (x<, Vi) (i = 1 , • • ■ , iV), the usually employed method of least 
squares for determining the coefficients a, b, of the straight line y ~ ax + b 
is that of choosing values of a and h which minimize the sum of the squares of 
the residuals of the y' s, i.e. 2(aa\ + b — yi) 2 * is a minimum. It is well known 
that treating y as an independent variable and minimizing the sum of the 
squares of the residuals of the x’s, we get a different straight line as best fit. It 
has been pointed out 1 that if both variables are subject to error there is no 
reason to prefer one of the regression lines described above to the other. For 
obtaining the “best fit/' which is not necessarily equal to one of the two lines 
mentioned, new criteria have to be found. This problem was treated by R. J. 
Adcock as early as 1877. 2 

He defines the line of best fit as the one for which the sum of the squares of 
the normal deviates of the N observed points from the line becomes a minimum. 
(Another early attempt to solve this problem by minimizing the sum of squares 
of the normal deviates was made by Karl Pearson. 8 ) 

Many objections can be raised against this method. First, there is no justifi¬ 
cation for minimizing the sum of the squares of the normal deviates, and not 
the deviations in some other direction. Second, the straight line obtained by 
that method is not invariant under transformation of the coordinate system. 
It is clear that a satisfactory method should give results which do not depend 
on the choice of a particular coordinate system. This point has been empha¬ 
sized by C. F. Roos. He gives 4 * * * a good summary of the different methods and 
then proposes a general formula for fitting lines (and planes in case of more than 
two variables) which do not depend on the choice of the coordinate system. 

1 See for instance Henry Schultz’ “The Statistical Law of Demand,” Jour, of Political 
Economy , Vol. 33, Dec. (1925). 

* Analyst , Vol. IV, p. 183 and Vol. V, p. 53. 

* “On Lines and Planes of Closest Fit to Systems of Points in Space” Phil. Mag. 6th 
Ser. Vol. II (1901). 

4 “A General Invariant Criterion of Fit for Lines and Planes where all Variates are 

Subject to Error,” Metron t February 1937. See also Oppenheim and Roos Bulletin of the 

American Mathematical Society , Vol. 34 (1928), pp. 140-141. 

284 



FITTING OF STRAIGHT LINES 


285 


Roos’ formula includes many previous solutions' as special oases. H. E. Jones' 
gives an interesting geometric interpretation of Roos’ general formula. 

It is a common feature of Roos’ general formula and of all other methods 
proposed in recent years that the fitted straight line cannot be determined 
without a priori assumptions (independent of the observations) regarding the 
weights of the errors in the variables x and y. That is to say, either the standard 
deviations of the errors in x and in y are involved (or at least their ratio is 
included) in the formula of the fitted straight line and there is no method given 
by which those standard deviations can be estimated by means of the observed 
values of x and y. 

R. Frisch 7 has developed a new general theory of linear regression analysis, 
when all variables are subject to error. His very interesting theory employs 
quite new methods and is not based on probability concepts. Also on the basis 
of Frisch’s discussion it seems that there is no way of determining the “true” 
regression without a priori assumptions about the disturbing intensities. 

T. Koopmans* combined Frisch’s regression theory with the classical one in 
a new general theory based on probability concepts. Also, according to his 
theory, the regression line can be determined only if the ratio of the standard 
deviations of the errors is known. 

In a recent paper R. G. D. Allen* gives a new interesting method for deter¬ 
mining the fitted straight line in case of two variables x and y. Denoting by a, 
the standard deviation of the errors in x, by <r, the standard deviation of the 
errors in y and by p the correlation coefficient between the errors in the two 
variables, Allen emphasizes (p. 194)’ that the fitted line can be determined only 
if the values of two of the three quantities <r, , <r,, p are given a priori. 

Finally I should like to mention a paper by C. Eisenhart, 10 which contains 
many interesting remarks related to the subject treated here. 

In the present paper I shall deal with the case of two variables x and y in 
which the errors are uncorrelated. It will be shown that under certain con¬ 
ditions: 

(1) The fitted straight line can be determined without making o priori assump¬ 
tions (independent of the observed values x and y) regarding the standard 
deviations of the errors. 

(2) The standard deviation of the errors can be well estimated by means of 

1 For instance also Corrado Gini’s method described in his paper, “Bull’ Interpolazione 
di una Retta Quando i Valori della Variable Independente sono Affecti da Errori Acciden¬ 
tals,” Metron, Vol. I, No. 3 (1921), pp. 63-82. 

• “Some Geometrical Considerations in the General Theory of Fitting Lines and Planes/’ 
Metron , February 1937. 

7 Statistical Confluence Analysis by Means of Complete Regression Systems , Oslo, 1934. 

• Linear Regression Analysis of Economic Time Series , Haarlem, 1937. 

9 “The Assumptions of Linear Regression/’ Economical May 1939. 

10 “The interpretation of certain regression methods and their use in biological and 
industrial research/’ Annals of Math. Stat. f Vol. 10 (1939), pp. 162-186. 



286 


ABRAHAM WALD 


the observed values of x and y. The precision of the estimate increases with 
the number of the observations and would give the exact values if the number 
of observations were infinite. (See in this connection also condition V in 
section 3.) 

2. Formulation of the Problem. Let us begin with a precise formulation of 
the problem. We consider two sets of random variables 11 

xi, ,x N ) yi, ,y N . 

Denote the expected value E(Xi ) of X,- by Xi and the expected value E(y x ) of 
yi by Yi (i — 1, • • • , N). We shall call X, the true value of x<, 7, the true 
value of y x , x< — X t = e< the error in the t-th term of the x-set, and y x — 7,- = 
the error in the t-tli term of the y- set. 

The following assumptions will be made: 

I. The random variables «i, • • • , e N each have the same distribution and they 
are uncorrelated, i.e. E(titj) = 0 for i ^ j. The variance of a is finite. 

II. The random variables iji , ■ ■ ■ , Vs each have the same distribution and are 

uncorrelated, i.e. = 0 for i j. The variance of jj< is finite.. 

III. The random variables a and y,- (i = 1, • • • , N; j = 1 , • • ■ ,N) are un¬ 
correlated, i.e. E{uy,) — 0. 

IV. A single linear relation holds between the true values X and Y, that is to 
say Yi = aXi + j3 (t = 1, • • • , N). 

Denote by * a random variable having the same probability distribution as 
possessed by each of the random variables «i, ■ • • , t N , and by y a random 
variable having the same distribution as yi > • • • , Vn • 

The problem to be solved can be formulated as follows: 

We know only two sets of observations: x[, • • • ,x N \y[, ■ • ■ ,y' N , where x< 
denotes the observed value of Xi and y\ denotes the observed value of yi. We 
know neither the true values Xi, ••• , X N ; 7i, ••• ,Y N , nor the coefficients 
a and j9 of the linear relation between them. We have to estimate by means 
of the observations x(, • • ■ , x' N ; y[, • • • , y' N , (1) the values of a and j8, (2) the 
standard deviation <r, of e, and (3) the standard deviation <r, of y. 

Problems of this kind occur often in Economics, where we are dealing with 
time series. For example, denote by x, the price of a certain good G in the 
period U , and by j/,- the quantity of G demanded in U . In each time period U 
there exists a normal price X, and a normal demand 7< which would obtain if 
the influence of some accidental disturbances could be eliminated. If we have 
reason to assume that there exists between the normal price and the normal 
demand a linear relationship we have to deal with a problem of the kind de 
scribed above. 

In the following discussions we shall use the notations x< and also for their 


11 A random or stochastic variable is a real variable associated with a probability 
distribution. 



PITTING OF 8TBAIGHT LINES 


287 


observed values x\ and y\ since it will be clear in which sense they are meant 
and no confusion can arise. 


3. Consistent Estimates of the Parameters a, 0, c,, <r,. For the sake of 

simplicity we assume that N is even. We consider the expression 


( 1 ) 


flj - 


(Xi ■+- • • • + Xmf ~ fon+1 + 

N 


+ *y). 


at = 


(yi + • • • + Vm) — (y<n+l + * • * + Vn) 

...'. N 


where m — N/ 2. As an estimate of a we shall use the expression 

(2) _ o* _ (yi + • • • + y m ) — ( j /m+i + •• • + Vn) 

J a\ (*!+•••+ x m ) — (x m+ i + xn) 


We make the assumption 
V. The limit inferior of 

! (Xi + •• • 4* X m ) ~ (X m +l + • • • + Xn) 

I ... N . 


(N = 2,3, • • • ad. inf. 


is positive. 

We shall prove that a is a consistent estimate of a, i.e. a converges stochas¬ 
tically to a with N —* oo , if the assumptions I-V hold. Denote the expected 
value of a t by Si and the expected value of a* by a*. It is obvious that 


(3) 


ai — 


(Xt+ ... +X m ) - (X m+l + ... +Xy) 
N 


(i’i + ••• + Y m ) - (r„ +1 + ... 4 - y„) 

. N . 


On account of the condition IV we have 


(4) St = ad i, or - = a. 

d\ 

The variance of oi — di is equal to a\/N and the variance of Oj — d 2 is equal 
to /N. Hence aj and a 2 converge stochastically towards di and <h respectively. 

From that and assumption V it follows that also ^ converges stochastically 

Ol 

towards = a. The intercept 0 of the regression line will be estimated by 

Si 

(5) b = y — ax, where x = -^- and y — -^-. 

Denote by X the arithmetic mean of Xi, • ■ • , X N and by Y the arithmetic 
mean of Y \, • • • , F* . Since $ converges stochastically towards Y, £ towards 





288 


ABRAHAM WALD 


X, and a towards a, h converges stochastically towards Y — aX. From condi¬ 
tion IV it follows that Y — aX - f). Hence b converges stochastically 
towards /S. 

Let us introduce the following notations: 


s. 


]/■ 


2 <* 


JV 


£)* 


sample standard deviation of the ^-observations, 


Sy 


|/: 


(y<-VY 

N ' 


sample standard deviation of the ^-observations, 


8xy 



- x)(vi - y) 
AT 


sample covariance between the z-set and y-set. 


81 , s y and sxr denote the same expressions of the true values A'j, • ■ • , X N ; 
Yi, • • • , Y N . 

It is obvious that 


( 6 ) 


E(si) - 8 | + 


«N - 1 


N 


(7) 


e(4) = 4 


N — 1 

~N~ 


1 


( 8 ) E{Sxy) — Sir, 

where E(sl), E{&1), and E{Sxy) denote the expected values of si, si, and s xv . 
Since F< = aX< + we have 


(9) s r = a8 X , 

(10) Sir = <**i • 

From (8), (9) and (10) we get 


(ID 

( 12 ) 


* E(sJ 

*x =--, 

a 

Sy = aE (Sxy). 


If we substitute in (6) and (7) for $i and s* their values in (11) and (12), 
we get 


(13) ^ = [^(sj) - - 1), 

(14) - [£(sj) - aEisJW/iN - 1). 


l * I observe that the equations (0), (7) and (8) are essentially the same as those investi¬ 
gated by It. Frisch, Statistical Confluence Analysis pp. 51-52. See also Allen’s equations 
(4) l.c. p. 194. 




FITTING OF STRAIGHT LINES 


280 


Since 4,4» ft* converge stochastically towards their expected values and a 
converges stochastically towards a, the expressions 

(15) [si - ^]iW - 1) 
and 

(16) [sj - asjN/(N - 1) 
are consistent estimates of <r\ and a\ respectively. 


4. Confidence Interval for a. In this section, as well as in sections 5 and 6, 
only the assumptions I-IV are assumed to hold. In other words, all state¬ 
ments made in these sections are valid independently of Assumption V, except 
where the contrary is explicitly stated. 

Let us introduce the following notation: 


ft + • • • + ft». 


J/l + • • • + Vm 


ftn +1 + • • • + Xu. 

x% =-; 

m 


- _ Vm+l + • • • + Vs 

v* — -- 

m 


Z (xi - *0* + E & - ft)* 

( x) " ~ N 


Z (yi - fa)* + Z (vi- fa)* 
(sir = «-- 


a = ‘- 1 

**V “ 


Z (*< - ft )(vx - so + Z (ft - ft)(i/j - Si) 

<-l j-n+1 

N 


Xi, , ?i , ft, (si) 1 , (sir and 4 r denote the same functions of the true 
values Xi,’“,Xu, Y t , ••• ,Y N - The expressions si, si, and 4* are 
slightly different from the corresponding expressions s*, s„, and s**. The 
reason for introducing these new expressions is that the distributions of 8 X , 

8 V , and 8jy are not independent of the slope o = - of the sample regression 

ft 

line, but 4,4 and sly are distributed independently from a (assuming that e 
and n are normally distributed). The latter statement follows easily from the 

fact that according to (1) and (2) o = -- and 4,4 > 4» are distributed 

ft — ft 

independently of ft, ft, fix and fa . 



290 


ABKAHAM WALD 


In the same way as we derived (13) and (14), we get 
(130 o'* = |^(s') 2 - ^^JlV/(Ar - 2), 

(140 = [£(0 2 - a E(s'j)N/(N - 2). 

These formulae differ from the corresponding formulae (13) and (14) only in 
the denominator of the second factor, having there N — 2 instead of N — 1. 
This is due to the fact that the estimates s x , s v , Sgy are based on N — 1 degrees 
of freedom whereas s' x , s' y and s'^ arc based only on N — 2 degrees of freedom. 
From (13') and (14') we get the following estimates 13 for <r] and a\ : 

(17) [(«*)’- ^r]wv - 2), 

(18) [(0 s - as xy ]N/(N - 2). 

Hence we get as an estimate of <r 2 v + aa\ the expression: 

s 2 = [(^) 2 + a 2 (s' x ) 2 - 2as' xy ]N/(N - 2) 

^ at I £ [(Vi ~ aXi) - (yi - tt.r0] 2 + ]F) [(?/, - aXj) - (y 2 - «^)] 2 

_ -tV_1 t-1 

AT - 2 1’ ... *.' . N 


Now we shall show that 

( 20 ) 


(N - 2 )s 2 

" 2 ,22 
Vi + a a t 


has the x 2 -distribution with N — 2 degrees of freedom, provided that e and ?7 
are normally distributed. In fact, 


(yi — aXi) — (yi — aXi) ~ tu ~ au — (fji - acO (t = 1, • • • , m) 

and 


(Vi - <*Xj) - (V2 ~ <** 2 ) = Vi ~ - (fl2 ~ «€ 2 ) (i = m + 1, . . • , AT), 

where 


+ 1 

II 

+ € m 

> 

€m+l + * * * 

e 2 = - 

+ Cyv 

j 

m 


m 


= ^ ‘ • 
1 m 

+ Vm 

m _ Vm+l + * • • 

m 

+ 

1 



Since the variance of Vk — <*€* is equal to <s\ + aa] and since 1 7 * — ae* is un¬ 
correlated with rji — «€/ (fc ^ l) (k, l = 1 , • • • , N), the expression ( 20 ) has the 
X 2 -distribution with AT — 2 degrees of freedom. 


18 An “estimate” is usually a function of the observations not involving any unknown 
parameters. We designate here as estimates also some functions involving the parameter a. 



FITTING OF STRAIGHT LINES 


291 


Now we shall show that 

( 21 ) 


VN fli(a — a) 

2 i 2~2 

o', + a <r* 


is normally distributed with zero mean and unit variance. In fact from the 
equations (1)—(4) it follows that 


/ \ , fji — fja ( 

a\(a — a) = 02 + —-— — Oi I —-1 

^ V*i/ 


= 02 + 






Since the latter expression is normally distributed (provided that c and rj are 

<r 2 + a a 2 

normally distributed) with zero mean and variance ——^— 1 , our statement 


about (21) is proved. 

Obviously (20) and (21) are independently distributed, hence y/N — 2 times 
the ratio of (21) to the square root of (20), namely, 


( 22 ) 


, _ 2 V Na^a-a) _ a\(a - a) y/N - 2 

VN -'2's V( S i) 2 + a 2 (s') r - 2a S ; 


lias the Student distribution with N — 2 degrees of freedom. Denote by k the 
critical value of t corresponding to a chosen probability level. The deviation 
of a from an assumed population value a is significant if 


| ai( a — a)V N — 2 

I V(si) 2 T^(S) ? - :: 2a4 


The confidence interval for « can be obtained by solving the equation in a, 
(23) a\(a - a) 2 - [(s') 2 + «V)* - 2^] . 


Now we shall show that if the relation 

m a * > n~—2’ 


holds, the roots a% and a 2 are real and a is contained in the interior of the interval 
[colors]. From (19) it follows that 

(O 2 + « 2 («x) 2 — 2a«^ > 0 

for all values of a. Hence, for a = a the left hand side of (23) is smaller than 
the right hand side. On account of (24) there exists a value a' > a and a 






292 


ABRAHAM WALD 


value a" < a such that the left hand side of (23) is greater than the right hand 
side for a — a' and a = a". Hence one root must lie between a and a' and the 
other root between a" and a. This proves our statement. The relation (24) 
always holds for sufficiently large N if Assumption V is fulfilled. The confi¬ 
dence interval of a is the interval [ai, a*]. For very small N (24) may not hold. 

Finally I should like to remark that no essentially better estimate of the 
variance of i; — at can be given than the expression 8 2 in (19). In fact, we 
have 2N observations xi, • • • , x N ;yi, • • • ,ys . For the estimation of the 
variance of n — at we must eliminate the unknowns X t , • • • , X N and 0 . (The 
unknowns Y \, • • • , Y N are determined by the relations Y< = aXi + 0 and a is 
involved in the expression whose variance is to be determined.) Hence we have 
at most N — 1 degrees of freedom and the estimate in (19) is based on N — 2 
degrees of freedom. 


6. Confidence Interval for 0 if a is Given. In this case the best estimate of /? 
is given by the expression: 


b a = y — ax where x 


X\ -J- • • • + Xn 

. ~N 


and y = 


9i+ +V» 

N . 


We have 


where 


Hence, 

(25) 


ba - p = (# - Y) - a(x - X) = t! - at 


i 


«!+••• 


+ ts 


, and 4 


Vi + • • • + vn 
N 


VN (bg ~ 0) 

V 2 , 2 2 

+ Ot a. 


is normally distributed with zero mean and unit variance. It is obvious that 
the expressions ( 20 ) and (25) are independently distributed. Hence y/N — 2 
times the ratio of (25) to the square root of (20), i.e. 


, = VN (ba - ft) = VN - 2 (ba ~ $) 

VN-28 V(s')* + a\tf2ca'ay 


has the Student distribution with N — 2 degrees of freedom. Denoting by to 
the critical value of t according to the chosen probability level, the confidence 
interval for 0 is given by the interval: 

* + a(* x f — 2a«^ t u __ V (s^)* + a s (0* — 2 





FITTING OF STRAIGHT LINES 


293 


6 . Confidence Region for a and p Jointly. In most practical cases we want to 
know confidence limits for a and p jointly. A pair of values a, p can be repre¬ 
sented in the plane by the point with the coordinates a, p. A region R of this 
plane is called confidence region of the true point (a, p) corresponding to the 
probability level P if the following two conditions are fulfilled. 

( 1 ) The region R is a function of the observations x x , ■ ■ ■ , x N ; yi , • • • , y N , 
i.e. it is uniquely determined by the observations. 

( 2 ) Before performing the experiment the probability that we shall obtain 
observed values such that (a, p) will be contained in R, is exactly equal to P. 
P is usually chosen to be equal to .95 or .99. 

We have shown that the expressions ( 21 ) and (25), i.e. 

VN a } (a - a) VN ( 6 , - P) 

V <r, + a a, V + a a, 

are normally distributed with zero mean and unit variance. Now we shall 
show that these two quantities are independently distributed. For this purpose 
we have only to show that x, y, a, and 04 are independently distributed (01 and a* 
are defined in ( 1 )), but since 

ai — E(a,i) = («i — ej )/2 

os - E(at) = (% - fj,)/2 
x — E(x) = i 
# - 1E(y) = 1 j, 

we have only to show that e, n, e t — «*, ifi — fit are independently distributed. 
We obviously have 

, _ «i + ** _ _ 

> 

It is evident that «i, h, Vi and Vt are independently distributed. Hence, 
E[i(ii — ij)] = (Eij — Ei\)/2 — 0 and also E[v(Vi — Vi)] = (EVi — Efj%)/2 — 0. 
Since — h, Vi ~~ Vt, and i and V are normally distributed, the independence 
of this set of variables is proved, and therefore also ( 21 ) and (25) are inde¬ 
pendently distributed. It is obvious that the expression (20) is distributed 
independently of ( 21 ) and (25). From this it follows that 

N - 2 N[o?(a - a) s + (0 - a* - P)*\ 

2 . (N - 2 )s s 

(26) & -2)[o!(o-«)* + ( 0 - ai - Pf) 

' 2 [(«J)* + a ’(*y - 2o8'J 

has the F-distribution (analysis of variance distribution) with 2 and N — 2 
degrees of freedom. The F-distribution is tabulated in Snedecor’s book: Calcu- 



294 


ABRAHAM WALU 


lotion and Interpretation of Analysis of Variance, Collegiate Prees, Ames, Iowa, 
1934. The distribution of £ log F = i is tabulated in R. A. Fisher’s book: 
Statistical Methods for Research Workers, London, 1936. Denote by Fq the 
critical value of F corresponding to the chosen probability level P. Then the 
confidence region R is the set of points (a, j 8 ) which satisfy the inequality 


(27) 


N — 2 a?(a — a)* + (jj - cd — i8) 2 ^ „ 

2. («;? '+aV) f - 2«£~ 0 


The boundary of the region is given by the equation 

(28) a\(a — a)~ + (y — ax — 0) 2 = jf~2 ~ 

This is the equation of an ellipse. Hence the region R is the interior of the 
ellipse defined by the equation (28). If Assumption V holds, the length of the 
axes of the ellipse are of the order 1 / y/H, hence with increasing N the ellipse 
reduces to a point. 


7. The Grouping of the Observations. We have divided the observations in 
two equal groups Gj and G t , Gi containing the first half (x 2 , yi), ■ ■ ■ , ( x m , y n ) 
and Gi the second half ( x m+ i, y m +i), • • • , (x N , y N ) of the observations. All 
the formulas and statements of the previous sections remain exactly valid for 
any arbitrary subdivision of the observations in two equal groups, provided 
that the subdivision is defined independently of the errors ei, • • • , t N ; 
iji , ■ ■ ■ , ij N ■ The question of which is the most advantageous grouping arises, 
i.e. for which grouping will a be the most efficient estimate of a (will lead to 
the shortest confidence interval for a). It is easy to see that the greater | o ( | 
the more efficient is the estimate a of a. The expression | a\ | becomes a maxi¬ 
mum if we order the observations such that Xi < xt < ■ ■ ■ < x s . That is to 
say | a t | becomes a maximum if we group the observations according to the 
following: 

Rule I. The point (x{, y$ belongs to the group Gi if the number of elements 
Xj (j 7 ^ i) of the series Xi, • • ■ ,x N for which x,- < x,- is less than m = N/2. The 
point (xt, yi) belongs to Gt if the number of elements x,- (j j* i) for which x,- < x< 
is greater than or equal to m. 

This grouping, however, depends on the observed values Xi, • • • , x N and is 
therefore in general not entirely independent of the errors ti, ■ ■ • , t N • Let us 
now consider the grouping according to the following: 

Rule II. The point (x<, yi) belongs to the group Gi if the number of elements 
Xj of the series A”i, ■ ■ ■ , X* for which Xj < Xi (j i) is less than m. The 
point ( Xi , yi) belongs to Gt if the number of elements Xjfor which Xj < Xi (j ^ i) 
is equal to or greater than m. 



FITTING OF STRAIGHT LINES 


295 


The grouping according to Rule II is entirely independent of the errors 
«i i • • • > tit ! »li, • • • i Vn • It is identical with the grouping according to Rule I 
in the following case: Denote by x the median of x\, ■ ■ ■ , x* ; assume that t 
can take values only within the finite interval [—c, +c] and that all the values 
* 1 , • • • , Xu fall outside the interval [x — c, x + c]. It is easy to see that in 
this case *.• < x (i = 1, • • • , N) holds if and only if X{ < X, where X denotes 
the median of Xj, • • • , X N . Hence the grouping according to Rule II is 
identical to that according to Rule I and therefore the grouping according to 
Rule I is independent of the errors «i, • • • , t K . In such cases we get the best 
estimate of a by grouping the observations according to Rule I. Practically, 
we can use the grouping according to Rule I and regard it as independent of the 
errors «i, • • • , t N ; Vi, • • • , Vn if there exists a positive value c for which the 
probability that j e | > c is negligibly small and the number of observations 
contained in [x — c, x + c] is also very small. 

Denote by a' the value of a which we obtain by grouping the observations 
according to Rule I and by a" the value of a if we group the observations 
according to Rule II. The value a" is in general unknown, since the values 
Xi , • • • , X N arc unknown, except in the special case considered above, when 
we have a" = a'. We will now show that an upper and a lower limit for a” 
can always be given. First, we have to determine a positive value c such that 
the probability that | 1 1 > c is negligibly small. The value of c may often be 
determined before we make the observations having some a priori knowledge 
about the possible range of the errors. If this is not the case, we can estimate 
the value of c from the data. It is well known that if we have errors in both 
variables and fit a straight line by the method of least squares minimizing in 
the ^-direction, the sum of the squared deviations divided by the number of 
degrees of freedom will overestimate a \. Hence, if e is normally distributed, 
we can consider the interval [—3t», 3i>] as the possible range of «, i.e. c = 3v, 
where v 2 denotes the sum of the squared residuals divided by the number of 
degrees of freedom. If the distribution of t is unknown, we shall have to take 
for c a somewhat larger value, for instance c = 5v. After having determined c, 
upper and lower limits for a" can be given as follows: we consider the system S 
of all possible groupings satisfying the conditions: 

(1) If Xi < x — c the point (x,, y t ) belongs to the group (?j. 

(2) If Xi > x + c the point (x<, y { ) belongs to the group G t . 

We calculate the value of a according to each grouping of the system $ and 
denote the minimum of these values by a*, and the maximum by a**. Since 
the grouping according to Rule II is contained in the system S, a* is a lower 
and a** an upper limit of a". 

Let g be a grouping contained in S and denote by I„ the confidence interval 
for a which we obtain from formula (23) using the grouping g. Denote further 
by I the smallest interval which contains the intervals 7, for all elements g 
of S. Then I contains also the confidence interval corresponding to the grouping 
according to Rule II. If we denote by P the chosen probability level (say 



296 


ABRAHAM WALD 


P = .95), then we can say: If we were to draw a sample consisting of N pairs 
of observations (xi, y x ), . * * , (x N , y N ) } the probability is greater than or equal 
to P that we shall obtain a system of observations such that the interval I will 
include the true slope <*. 

The computing work for the determination of / may be considerable if the 
number of observations within the interval [x — c, x + c] is not small. We 
can get a good approximation to I by leas computation work as follows: First 
we calculate the slope a' using the grouping according to Rule I and determine 
the confidence interval [a' — 5, a' + A] according to formula (23). Denote by 

a(g) the value of the slope, i.e. the value of ~—f- 2 , corresponding to a grouping 

Xl X 2 

g of the system S, and by [a(g) — 5 g , a{g) + A*] the corresponding confidence 
interval calculated from (23). Neglecting the differences ( 8 0 — 5) and (A„ — A), 
we obtain for I the interval [a* — a** + A]. 

If the difference a** — a* is small, we can consider I = [a* — 5, a* 4 * + A] as 
the correct confidence interval of a corresponding to the chosen probability 
level P . If, however, a** — a * is large, the interval I is unnecessarily large. 
In such cases we may get a much shorter confidence interval by using some 
other grouping defined independently of the errors €i, • • • , ts ; Vi > • • ■ > Vs • 
For instance if we see that the values Xj , • • • , x s considered in the order as 
they have been observed, show a monotonically increasing (or decreasing) tend¬ 
ency, we shall define the group G x as the first half, and the group G 2 as the 
second half of the observations. Though we decide to make this grouping after 
having observed that the values x x , • • • , x N show a clear trend, the grouping 
can be considered as independent of the errors t x , • • • , t N . In fact, if the 
range of the error c is small in comparison to the true part X , the trend tendency 
of the value x x , • • • , x N will not be affected by the size of the errors e x , • ■ • , * N . 
We may use for the grouping also any other property of the data which is 
independent of the errors. 

The results of the preceding considerations can be summarized as follows: 

We use first the grouping according to Rule I, calculate the slope a' = -^ 

x x ' x 2 

and the corresponding confidence interval [a' — 5 ,a'+ A] (formula (23)). This 
confidence interval cannot be considered as exact since the grouping according 
to Rule I is not completely independent of the errors. In order to take account 
of this fact, we calculate a* and a**. If a** — a* is small, we consider I = 
[a* — 5, a** + A] with practical approximation as the correct confidence interval. 
If, however, a** — a* is large, the interval I is unnecessarily large. We can 
only say that I is a confidence interval corresponding to a probability level 
greater than or equal to the chosen one. In such cases we should try to use 
some other grouping defined independently of the errors, which eventually will 
lead to a considerably shorter confidence interval. 

Analogous considerations hold regarding the joint confidence region for a 
and 0. We use the grouping according to Rule I and calculate from (27) the 



FITTING OF STRAIGHT LINES 


297 


corresponding confidence region R. If | a** — a* | and | b** — 6* | are small 
(6* = g — a*x and &** = # — «**£) we enlarge R to a region R corresponding 
to the fact that a and b may take any values within the intervals [a**, a*] and 
[6**, b*] respectively. The region R can be considered with practical approxi¬ 
mation as the correct confidence region. If | a ** — a* | or | &** — 6* | is large, 
we may try some other grouping defined independently of the errors, which 
may lead to a smaller confidence region. In any case R represents a confidence 
region corresponding to a probability level greater than or equal to the 
chosen one. 

8. Some Remarks on the Consistency of the Estimates of a, fi, <r, , cr„ . We 

have shown in section 3 that the given estimates of a, 0, cr, and cr, are consistent 
if condition V is satisfied. 

If the values X\ , • • • , x N are not obtained by random sampling, it will in 
general be possible to define a grouping which is independent of the errors and 
for which condition V is satisfied. We can sometimes arrange the experiments 
such that no values of the series xi , • • • , x N should be within the interval 
[x — c, x + c] where x denotes the median of Xi , • • • , x N and c the range of 
the error e. In such cases, as we saw, the grouping according to Rule I is 
independent of the errors. Condition V is certainly satisfied if we group the 
data according to Rule I. 

Let us now consider the case that Xi, • • ■ , X N are random variables inde¬ 
pendently distributed, each having the same distribution. Denote by X a 
random variable having the same probability distribution as possessed by each 
of the random variables Xi, • * ,Xy. Assuming that X has a finite second 
moment, the expression in condition V will approach zero stochastically with 
N —► oo for any grouping defined independently of the values X \, • • • , X N . 
It is possible, however, to define a grouping independent of the errors (but not 
independent of X \, • • • , X N ) for which the expression in V does not approach 
zero, provided that X has the following property: There exists a real value X 
such that the probability that X will lie within the interval [X — c, X + c] 
(c denotes the range of the error e) is zero, the probability that X > X + c 
is positive, and the probability that X < X — c is positive. The grouping can 
be defined, for instance, as follows: 

The r-tli observation (a\ , Vi) belongs to the group G\ if a** < X and to G 2 if 
Xi > X. We continue the grouping according to this rule up to a value i for 
which one of the groups G \, (? 2 contains already N /2 elements. All further ob¬ 
servations belong to the other group. 

It is easy to see that the probability is equal to 1 that the relation Xi < X 
is equivalent to the relation Xi < X — c and the relation Xi > X is equivalent to 
the relation A r < > X + c. Hence this grouping is independent of the errors. 
Since for this grouping condition V is satisfied, our statement is proved. 

If X has not the property described above, it may happen that for every 
grouping defined independently of the errors, the expression in condition V con- 



298 


ABRAHAM WALD 


verges always to zero stochastically. Such a case arises for instance if AT, € and 
ij are normally distributed. 1,1 It can be shown that in this case no consistent 
estimates of the parameters a and /S can be given, unless we have some addi¬ 
tional information not contained in the data (for instance we know a priori the 
ratio 

9. Structural Relationship and Prediction. 1 * The problem discussed in this 
paper was the question as to how to estimate the relationship between the true 
parts X and Y. We shall call the relationship between the true parts the struc¬ 
tural relationship. The problem of finding the structural relationship must not 
be confused with the problem of prediction of one variable by means of the 
other. The problem of prediction can be formulated as follows: We have ob¬ 
served N pairs of values (xi, j/i), • • • , (x N , y N ). A new observation on x is 
given and we have to estimate the corresponding value of y by means of our 
previous observations (x,, yi), • • • , (x N , y N ). One might think that if we have 
estimated the structural relationship between X and Y, we may estimate y by 
the same relationship. That is to say, if the estimated structural relationship 
is given by Y = aX -f b, we may estimate y from x by the same formula: 
y — ax + b. This procedure may lead, however, to a biased estimate of y. 
This is, for instance, the case if X, t and y are normally distributed. It can 
easily be shown in this case that for any given x the conditional expectation of 
y is a linear function of x, that the slope of this function is different from the 
slope of the structural relationship, and that among all unbiased estimates of 
y which are linear functions of x, the estimate obtained by the method of least 
squares has the smallest variance. Hence in this case we have to use the least 
square estimate for purposes of prediction. Even if we would know exactly the 
structural relationship Y = aX + /3, we would get a biased estimate of y by 
putting y = ax + 0. 

Let us consider now the following example: X is a random variable having 
a rectangular distribution with the range [0, 1]. The random variable e has a 
rectangular distribution with the range [—0.1, + 0.1]. For any given x let us 
denote the conditional expectation of y by E(y | x) and the conditional expecta¬ 
tion of X by E(X | x). Then we obviously have 

E(y | x) = aE(X | x) + 0. 

Now let us calculate E{X | x). It is obvious that the joint distribution of X and 
e is given by the density function: 

5 dXdt, 


14 1 wish to thank Professor Hotelling for drawing my attention to this case. 

» I should like to express my thanks to Professor Hotelling for many interesting sug¬ 
gestions and remarks on this subject. 



FITTING OF BTKA1GHT LINKS 


299 


where X can take any value within the interval [0,1] and c can take any value 
within [—0.1, + 0.1]. From this we obtain easily that the joint distribution of 
x and X is given by the density function 


5 dxdX, 

where x can take any value within the interval [—0.1,1.1] and X can take any 
value lying in both intervals [0,1] and [a; — 0.1, x + 0.1] simultaneously. De¬ 
note by 7* the common part of these two intervals. Then for any fixed x the 
relative distribution of X is given by the probability density 


Hence, we have 


dX 


\iX 


[ XdX 
E(X\x) = ^- 


L 


dX 


We have to consider 3 cases: 


( 1 ) 

In this case 7, 


0.1 < x < 0.9. 
[x - 0.1, x + 0.1] and 


E(X \x) = 


-jrru.x 

Jx- 0. 1 

Jx-Q.l 


XdX 
1 dX 


x. 


(2) -0.1 < x < 0.1. Then 7, = [0, x + 0.1] and 

M+fl.I 


E{X\x) = 0 


r 

i. 


XdX 


= .5* + .05. 


dX 


(3) 0.9 < x < 1.1. Then 7, = [* — 0.1,1] and 


[ XdX 

E(X\x) - - = .5* + .45. 

I dX 
J*- o.i 




300 


ABBAHAM WALD 


Since 


E(y 1 x) = aE(X | x) + ft 

wc see that the structural relationship gives an unbiased prediction of y from x 
if 0.1 < x < 0.9, but not in the other cases. 

The problem of cases for which the structural relationship is appropriate also 
for purposes of prediction, needs further investigation. I should like to mention 
a class of cases where the structural relationship has to be used also for prediction. 
Assume that we have observed N values (xi, yi ), • • • , (x N , y K ) of the variables 
x and y for which the conditions I IV of section 2 hold. Then we make a new 
observation on x obtaining the value x\ We assume that the last observation 
on x has been made under changed conditions such that we are sure that x' does 
not contain error, i.e. x' is equal to the true part X'. Such a situation may arise 
for instance if the error t is due to errors of measurement and the last observa¬ 
tion has been made with an instrument of great precision for which the error of 
measurement can be neglected. In such cases the prediction of the correspond¬ 
ing y' has to be made by means of the estimated structural relationship, i.e. we 
have to put y' = ax' + b. 

The knowledge of the structural relationship is essential for constructing any 
theory in the empirical sciences. The laws of the empirical sciences mostly 
express relationships among a limited number of variables which would prevail 
exactly if the disturbing influence of a great number of other variables could 
be eliminated. In our experiments we never succeed in eliminating completely 
these disturbances. Hence in deducing laws from observations, we have the 
task of estimating structural relationships. 

Columbia University, 

New York, N. Y. 



A METHOD FOR MINIMIZING THE SUM OF ABSOLUTE VALUES 

OF DEVIATIONS 


By Robert R. Singleton 

1. Introduction. In the Philosophical Magazine , 7th series, May 1930, E. C. 
Rhodes described a method of computation for the estimation of parameters 
by minimizing the sum of absolute values of deviations. His is an iterative 
and recursive method, in the following sense. There is a direct method for 
minimization with one parameter. Assuming a method for minimization with 
n — 1 parameters, Rhodes imposes a relation between the n parameters (in an 
n-parameter problem) and finds a restricted minimum by the method for n — 1 
parameters. In this sense his method is recursive. He then repeats the process, 
by imposing on the n parameters a now relation determined by the restricted 
minimum. In this sense his method is iterative. The process is finite, ending 
when a restricted minimum immediately succeeds itself, indicating a true 
minimum. 

Rhodes' paper presents the method without proof. The purpose of the 
present paper is to analyze the situation in detail sufficient to indicate proofs 
for various methods, and to present a new method which reduces the labor of 
solution by eliminating the recursive feature. The iterative approach is re¬ 
tained. The solution of Rhodes' illustrative problem will be given for com¬ 
parison between the two methods. 

The paper uses geometric, terminology and develops to quite an extent the 
geometry of a surface representing the summed absolute deviations. This 
seems the clearest means of presenting the relationships. Further analysis of 
the properties of this surface should lead to an even more direct method for 
attaining the minimum than the one here presented. 

In the writing of the paper, no attention has been given to sets of observa¬ 
tions or equations among which a linear dependence may exist. In practice, 
such a situation almost never occurs. If the need arises, the adjustments 
which must be made to take care of dependence are in each case fairly obvious. 

2. Geometric Analogue of Summed Absolute Deviations. Let n observa¬ 
tions on v + 1 variates be represented by x l a , y' where i = 1, • • , n; a = 

1, • * , v. Unless otherwise noted, latin indices have range 1 to n, greek indices, 
1 to v. The summation convention of tensor analysis is used. 

The variates are to be statistically related by the linear function 1 

_ y‘ = 

1 This includes the linear funotion with a constant, since a variate x* ■■ 1 may be used. 

301 



302 


ROBERT R. SINGLETON 


being an estimate of y\ u a are to be determined so that v — 2< | — y* | 

is a minimum. Set 

(1) v* ■= x { a u a — y { 

and determine functions e'(u“) so that e V > 0, and | e' | = 1. It is immaterial 
that e is not uniquely determined when u a satisfies v' = 0. Then v = 2,-eV 
is to bti minimized. Using (1), 

(2) v - x„u a - y 
where 

x a = XiSx'a, y = 2,V y\ 

Consider a Euclidean (v + l)-space, E,+ 1 , with coordinates u 1 , • • ■ , u, v. 
The coordinate hyperplane perpendicular to the t>-axis will be called E,. In 
E,+ 1 each of equations (1) for a particular i represents a v-plane which intersects 
E, in a (v — l)-plane when v' = 0. Each of the equations 

f 3) v* = e\xW - y) 

represents two half-planes which touch E, and each other along the (r — 1)- 
plane given in E, by the equation 

(4) xW - y' = 0. 

The functions on the right-hand side of (3) are thus continuous everywhere, 
and linear in any neighborhood of E, none of whose points satisfies (4). Since 
a sum of functions continuous and linear in a neighborhood is also continuous 
and linear in that neighborhood, it follows that the function on the right in (2) 
is continuous for all u, and linear for every neighborhood of E, containing no 
points which satisfy (4) for any i. Hence 
Observation I: The surface (S) given in E y+l by (2) consists of portions of 
v-planes joined together. The projection of these joins on E, forms a network of 
{v — 1 )-planes determined in E, by equations (4). 

3. Existence of a Minimum. Define a “bend of degree r on S" to be the 
locus of all points on S whose u-coordinates satisfy a set of r independent 
equations of (4). To each set of r independent equations corresponds a unique 
bend of degree r. 

If a linear relation u" = a?\* + b a , <r — 1, • • • , y < v, rank (a“) = p, is 
imposed on u a , all the preceding development, reduced in dimension, applies 
to the new variates xia“ , y' — x'J> a . 

Observation II: A section of S by a plane of any dimension d < v has all 
the properties of an S-surface of dimension d. 

Since any set of consistent equations selected from (4) determines such a 
linear relation for the application of Observation I to any of the bends of S 
shows that each r-bend consists of linear elements of dimension v — r, joined 



MINIMIZING SUM OF ABSOLUTE OBVIATIONS 


at points which lie on linear elements of lesser dimension. Thus S is a poly¬ 
hedron. Its faces we term complexes of dimension v, C, , and the linear ele¬ 
ments of its edges which lie wholly in bends of degree r, but not of degree r + 1 
are complexes C,_ r of dimension v — r. The boundary of any C a , <* > 0, 
consists of complexes of lesser dimension. The term complex is not restricted 
to either open or closed complexes. 

Since the function v(u a ) of (2) is non-negative, it possesses a greatest lower 
bound (g.l.b.) g. Since for some number h > g, there exists an N such that 
for all | m“ | > N, v(u“) > h, it follows that for some closed neighborhood of E, 
the g.l.b. of v is g. Since «; is continuous everywhere it attains its g.l.b., and 
so S has minimum points. Since the minimum of any complex not parallel 
to E, , lies on its boundary, and the boundary consists of complexes, it follows 
that the minimum points of S consist of Co’s and/or entire complexes of dimen¬ 
sion > 0 which are parallel to E, . The next section will Bhow that S has a 
unique minimum complex (including of course its boundary complexes) and 
furthermore is cup-shaped. 


V 



4. Convexity Property; Uniqueness of the Minimum. Consider v - 1 in 
the preceding treatment (and for convenience not written). S looks generally 
like Fig. 1. The slope changes only where an equation of (4) has a root. Sup¬ 
pose the point is «o, and x'uo — y l — 0. From (3), since v 1 > 0, it follows 
that «V < 0 for u < u», eV > 0 for u > Mo. Since in (2) x = 2<e*x’, and 
since for h sufficiently small and no — h < u < u<> + h the only e to change 
value* is e 1 , we have that 

x(wi) + 2 | eV | «= x(ut) 

where 


Mo — A<Ui<Uo<W*<Mo + A. 

Hence the slope is a monotonic increasing step function. Since for u suffi¬ 
ciently small all cV < 0, and for u sufficiently large all t'x' > 0, at some inter¬ 
mediate point or points either the slope is zero or it changes from negative to 

* The e’e corresponding to equations proportional to equation (1) also change value at x 0 . 
This does not destroy the argument. 



304 


ROBERT R. SINGLETON 


positive without becoming zero. In the first case a single closed C\ is the 
minimum complex; in the second, a C 0 . In either case the curve given by (2) 
when v = 1 is concave upward and has just one minimum complex, except for 
complexes of lesser dimension constituting the boundary of this complex. An 
obvious consequence is 

Lemma I. The set of points u for which v is less than some number N form a 
convex point set. 

This result is easily extended to the general dimension v. If for any two 
points U\ , U 2 of E v , v(ui) < N and v(u 2 ) < N y the plane in E,+ y given by u a = 
U\ + A(wj — U\) makes a one-dimensional section of S. By Observation II, 
the points u lying on the projection of this section on E v have the property of 
Lemma I and of course lie on the straight line joining ii\ and u 2 . This is the 
property required for a convex point set. Hence 

Theorem I. The set of points u n of E v for which v(u a ) as given by (2) is less 
than a fixed quantity form a convex point set. 

From this it follows immediately that there is a unique minimum complex. 
It is appropriate here to point out that no two complexes can be contained in a 
single plane of the same dimension. 'Phis follows from the equation giving 
monotonicity of slope in one dimension, and Observation II. 

5. Gradient Directions. From here on the treatment will be of v as a function 
defined on E P , and the equations will represent objects in E v , unless otherwise 
stated. Complex and Bend also will refer to the projections on E v of the com¬ 
plexes and bends of S. For a single-valued function defined on E v the gradient 
at a point is the projection of a normal to t he surface representing the function 
in Ey+i . If the function is defined only over a subspace of E v possessing deriva¬ 
tives, the gradient will be required also to be tangent to the subspace. This is 
sufficient to determine a unique direction, and preserves the property that for an 
infinitesimal displacement in any direction the value of the function decreases 
most rapidly in the direction of the gradient. Here gradient is taken negative 
to its usual sense. 

A point u lying on a C r but- not on a CV-i will have a gradient in C r and also 
in each higher-dimensional complex on whose boundary C r lies. If the gradient 
for u as a point of C r +k points into C r +k (remembering that u lies on the boundary) 
this will be called a usable gradient. In the case of the greatest k for which 
there exists a usable gradient, there exists but one C r +k providing such a gradient, 
and that gradient is the “best” gradient ; that is, of all directions in E v it pro¬ 
vides the direction of most rapid decrease of the function v. This follows from 
Theorem I. Furthermore, all complexes of lesser dimension providing usable 
gradients lie on the boundary of this C r +k - In fact 

Theorem II. If for a point u on C r , two complexes C a and C [, s > r, lying 
in different bends of degree v — s but incident at C r , both provide usable gradients 
for u , then the complex C f+ x on whose boundary lie both C\ and C[ also provides a 
usable gradient for u. 



MINIMIZING BUM OF ABSOLUTE DEVIATIONS 


306 


This follows from Theorem I. Select wi on the gradient in C., u% on the 
gradient in C [, for which v(u{) = «/(w*). The join of Wi and U* lies in C.+ 1 , 
and for some point, u% on this join, v(u*) is less than v(uy) = t;(w*). Also, the 
distance uu% is less than at least one of wui , ww* . Hence C ,+1 must contain a 
usable gradient. 


6. Selection of Best Gradient at Bends. The direction of the gradient for a 
point wo considered as lying on a C P is given by 

(5) Q 55 o) = “"2 (ud)X a » 

If Wo lies in the interior of a face, this is unique. If Wo lies in a bend, so that 
some e x are not determined, the g a for each face is found by selecting the indeter¬ 
minate e’s as +1 or — 1 , according to the face being considered. 

For a point wo considered as lying on a bend of degree r, given by r inde¬ 
pendent equations of (4): 

( 6 ) * x w a -y x = 0, (X — 1, 

the gradient for a particular C v ~r , determined by the conditions at the begin¬ 
ning of section 5, is 

(7) g* = x x a fa — x a 


where k\ satisfies 

SaXiXa&A = 2<,a£x«, 0» - 1, • • • , r) 


arid x a is as given in ( 2 ), the choice of sign for the indeterminate c x 
(X = 1, ..., r) being immaterial. They may, in fact, be taken as 0 in this 
instance. 

For a point w£ lying on an r-bend given by ( 6 ), to determine which complex 
contains the best gradient, each (r — l)-bend incident on the r-bend at Wo is 
tested for a usable gradient. Theorem II then determines the complex con¬ 
taining the best gradient. 

There are 2 r such complexes incident at wo , given by the r sets of equations 
selected from ( 6 ): 


( 8 ) 


(x): *: 


rW 


/ = 0 


(** 1 , , x - 1 , x + 1 , - ,r) 

(X =!,••• ,r). 


The two complexes lying in the same (r — l)-bend have the same equations in 
( 8 ), but are distinguished later by e x (uo) for the omitted equation being taken 
first + 1 , then — 1 . 

The gradient for the Xth pair of complexes is 

g\ = x a k w — x a 

similar to (7), but not identical. For c x = +1 in determining x a , we have 
0 x+, and for e x = — 1 , gf£L . We restrict the consideration to e x - + 1 . 



306 


ROBERT R. SINGLETON 


The line in the direction of greatest slope is then 

u a — Wo + Q\+t* 

Now no is here considered lying on the complex given by ( 8 X) with e x = +1. 
In order that g*+ point into this face, the deviation for the Xth observation 
must exceed 0 when t > 0 ; otherwise, for a displacement in the direction of 0 *+ , 
e x changes sign immediately and the course is in the other complex. This 
deviation is 

v ‘ x = x x a u a — y x = x x a uo — y x + x x a g*+t = x x a g\+L 

Had g\ L been used, this deviation must be less than 0 . Hence a necessary and 
sufficient condition that a complex given by ( 8 ) with either choice of e x possess 
a usable gradient is 

(9) = e [h a XaX a ka ^ot^ar^a] ^ 0 . 

For r * 1 the condition is given by (9) with the first sum merely omitted. 
<f>x+ and <J>x- cannot both exceed 0 . 

When all sets of equations ( 8 X) are tested by (9) the equations common to 
all sets possessing a usable gradient determine the complex with the best 
gradient, retaining the values of e for which (9) was satisfied. 

7. Property of the Minimum Point. For a minimum point, given by (6) 
with r = v, all <f>\ must be negative. Define X Py = 2 a xlxl and X*° = 
for convenience. Then in (9), the numbers k 9 , — 1 are seen from their defini¬ 
tion in (7) to be proportional to the cofactors of the Xth row of the matrix 
( X M<r , X M °), \i having the same range as X. Thus 4>x+ = c Det (X* 1 *, X+), and 
4>x- = — c Det (X^, X?L 0 ), where in the first case X M ° is determined with e = +1, 
in the second with e x = —1. The factor of proportionality, c, must be the 
same since X* 9 is unaffected by change of e\ Now let X M = where 

x a = 2 ifi x a , the range of k omitting the range of X. Then 

4>x+ = c [Det (X"', X") + Det (X 1 *, X'*)] 

and 

<*>x_ = -c [Det (X"', X") - Det (X", X**)]. 

Hence 

*M*X- = -c 2 {[Det (X' 1 ', X ")] 2 - [Det (X"', A"*)] 2 ). 

Now let A represent the square matrix (x“), a giving the rows and X the columns. 
Let fix represent the matrix formed from A by replacing the Xth column by x* . 
Then 

4Vi>x- = -c 2 [Det 2 U'fix) - Det 2 (A'A)] 

= -c 2 Det 2 A (Det* fix - Det* A) 



MINIMIZING SUM OF ABSOLUTE DEVIATIONS 


307 


and this will have the same sign as 

*x - | Det (A) | - | Det (Bx) |. 

Since $x+ and are never both positive, and at the minimum are both nega¬ 
tive for all X, at the minimum all 'J'x > 0. To determine all Sf'x together, let, 
in matrix notation, z' = (zi, • • • , z,) and x*' — (x?, • • • , x*) where xZ were 
defined previously. Determine z as the solution of Az = x*. Then | Det (Bx) | 
are equal to | zx|| Det (A) |. Hence a necessary and sufficient condition that 
*x > 0 for all X is that all | Zx | be less than one. Hence 
Theorem III: If a zero-complex is given by a set of equations whose matrix is M, 
a necessary and sufficient condition that the complex be a unique minimum is that 
the solutions of M'z — x* be all less than one in absolute value. If k of the solu¬ 
tions are equal to one in absolute value, and the rest are less than one, the minimum 
is a complex of dimension k with the zero-complex as one of its comers. 

The last statement follows since if one solution is I in absolute value, a 
corresponding $x = 0, and hence no gradient, usable or not, exists. Thus the 
corresponding complex is parallel to E,. 

8. Minimization for One Dimension. A method for minimization of (2) when 
there is just one parameter evolves from the monotonicity of slope in that case. 
Suppose the variates are w' and z\ and (1) is 

(10) v { = w't — z\ 

Suppose the variates are arranged in order of z'/w', starting with the smallest. 
The slope of the rth segment (Fig. 1) from the left is 

LI v* I - 2 I w |. 

<-l i-r+1 

The minimum occurs when the slope is 0 or changes from negative to positive; 
that is, when the first sum equals or exceeds the second; or when the first sum 
equals or exceeds half the total. This is a standard computation. If the 
change takes place when r = k, then t = z k /w k is the value of t giving the 
minimum. 

9. Mimimization Procedure for v + 1 Dimensions. For any continuous func¬ 
tion with unique minimum and having the property of Theorem I, the following 
holds. Let ut be any point of E,. Let «<+i = «,• + X,<<, where X< is any 
direction chosen at random and U is the value of t for which the function attains 
a minimum on the curve u = u< + X<i. Then the probability is one that 

lim »<=■«!, where «i is a minimum point for the function. If X< is taken 
<-♦00 

always as the gradient of u<, such a procedure is called the “method of steepest 
descent” for approaching the minimum point. 

Usually the limit is never attained. In this case, however, the minimum is 



h N CO ^ N 


•sJlS 


+ l + 


© «o C<l 


I £ 


© © © S3 


sis 


°§s 


18 3 S ° S 


T3 *■• 

•S ^ 

H 




H H OO N O M CO 

8 S 3 5? g£ 8 g 

>f h fi U5 «5 o CO 


^ as w o 


§ s 


is^ 
© © 3 


© «35 © i—«©*-*©© o> 


d r- 55 

I I I 


S5SSS8SS! S 

r-.»-iaDoco»ocoi—• oo 

II ^ ^ I 


H 

CO 




+ I + + I + I + I 

OOOON^CO^QO t— 

Soi^SSSSoS © 

r-i»C(NQC^-QO(NO^ 

I 1^1 t J- 

POi—« -1* »0 —« »C <M CO t-- 


3! ^ ® ^ fi o 

C*! I s * t—• i-H © © r—I 

© co © © © © © 


lilt 


N *H |>. ifl 

S | § 8 

co 65 *»-• i—i 

I I I I 


rH oo 50 CO 

l l I 7? 


65 c6 -f lO 

I I I I I 




+ I + + I + I 


2S2 


50 65 «0 W CO « 


82R8 


"5 9 s SSS 3 $ s a 

I I I I I 


»—• O rH rf CD CO 




N CO iC 

I I I 


HQHNWitUJOhoO 


HN«^»OCON00050rKNeO^»0«N 

—‘ ‘ « »—< *-1 *-H t— 4 r-4 


39.990 - 4036.242 -120.570 








MINIMIZING SUM OP ABSOLUTE DEVIATIONS 


309 


attained. The minimum can be approached as closely as desired, hence a 
complex incident on the minimum is reached. But the convex point sets of 
Theorem I surrounding the minimum complex are all similar convex poly¬ 
hedrons in E ,, whose corresponding faces are parallel, and the gradients at 
points on a bend cannot'point into a higher dimensional complex on the bend. 
Hence the sequence of points lie on bends of successively greater degree, and 
must eventually attain the minimum complex. 

TABLE II 
Points ut 


w t+i “ w* Ok h 

Wo = (38, —5, —2) 

w, = (37.98202, -4.74828, -1.48457) 

wj = (37.45908, -2.07142, -1.85631) 

ut m (32.83333, -2.07142, -1.76191) 


TABLE III 


Computation of t* = 2 */w* 


2 11 o k 

in order 
of col. 

exceeds 

att ® 

hence tu *■ 

1 | Wo 

(10) 

17521 

16 

.00599334 

2 | Wi | 

(15) 

2502 

2 

.0397792 

2 | Wt | 

(20) 

4610 

10 

.00496545 


TABLE IV 


Gradients g“ for column (5 k + 8 ) 


k 

0 l k 

ol 

ol 

0 

-3 

42 

86 

1 

-13146 

67293 

-9345 

2 

-931588 

0 

19012 


The computational procedure is as follows: 

1. Select a point uo. 

2 . Determine the gradient g “ from (5). 

3. Compute t«o = xigS ,z\ = y' — x^uS . 

4. Determine k by the method of section 8 . 

5. Compute u? = Uo + gSk . 

6 . Determine the complex containing the best gradient by (9), and the 
gradient 0 ? by (7). 

and so proceed to the minimum. This may be finally tested by Theorem III. 






310 


ROBERT R. SINGLETON 


Step 5 is unnecessary, since the only use for u“ is to determine e'(ui). But 
e'(ui) => e'(U), the latter referring to the computation in step 4. Also, after 
the first step, it is easier to compute e* by 

2*+l = 2* - Wktk • 

10. Example. The computation for (9) is not so great as it would seem, since 
some of the work is duplication and some must be computed anyway for the 
gradient. Even so, for r > 3 it becomes, perhaps, more arduous than its 
contribution would seem to justify. For v > 4 it is recommended that the 
test of (9) be omitted for points on bends of third degree or greater, and the 
final test of Theorem III be applied at the end of the work. If this test shows 
the minimum has not been reached, the complex in which lies the best gradient 
will be indicated at the same time. 

The minimum number of steps is 0. The maximum number is tremendous 
but finite. The expected number is probably a little greater than v. 

In Tables I to IV, the method is applied to the problem used by Rhodes to 
illustrate his method. The independent variates are shown in columns (2), (3), 
(4), Table I, the dependent variate in column (5). The only other original 
datum is the initial point, selected by guess, shown in line 1, Tabl(' II. Since 
slightly different formulas were used in the computation, the signs of cols. 
(6), (8), (11), (16), (18) are reversed, and the gradients in Table IV are 
multiplied by constants. As they are used only for directions, this does not 
matter. 

Princeton University, 

Princeton, N. J. 



A STUDY OF A UNIVERSE OF n FINITE POPULATIONS WITH 
APPLICATION TO MOMENT-FUNCTION ADJUSTMENTS 
FOR GROUPED DATA 

By Joseph A. Pierce 

The object of this paper is to study the case of a universe of n finite popula¬ 
tions, considering both the expectations of population moment-functions and 
the moments of sample moments, and to make applications of the results which 
may be of interest to mathematical statisticians. The sampling formulas which 
are derived reduce to the usual infinite or finite sampling formulas, under 
appropriate assumptions. Also a method is given whereby finite sampling 
formulas may be transformed into the corresponding infinite sampling formulas. 

The general methods and formulas which are given in Part I for the expecta¬ 
tions of population moment-functions are used, in Part II, to find the expecta¬ 
tions of moments of a distribution of discrete data grouped in “k groupings 
of k". 


I. A Study of a Universe of n Finite Populations 

Let J ' n be a universe composed of the set of populations r X, (r = 1,2, • • • , n) 
each population ,X consisting of a finite number of discrete variates r x ,, 
(i — 1, 2, • • • , N), (N > n). The <th moment of r X is denoted by r /u . The 
<th central moment of r X is denoted by r Mt . The tth moment and the fth central 
moment of ,1V are respectively denoted by ju< and . The expected value of a 
variable y is denoted by E(y). We have 

| N j » 

rUt — E(.rXi) — ^£3 rXi , rfit ~ E(& r/il) = 23 (r** »Ml) » 

1 " 1 " 

(1.1) Mian = = - 23 Tilt , HV . ji , = E(rfi t) = - 23 rfit , 

' ' 71 r—1 71 r—1 

~ * * * r v Ptl)t 

• '*v'Pt x h J* • •#»!„ = ••• r v fit v )‘ 

Wc also note that , ■,.. ^ may be written jum.. * 

1. The expected value of moments and central moments. It follows easily 
from (1.1) that 

( 1 . 2 ) 


Mian *= Mi • 
311 



312 


JOSEPH A. PIERCE 


From the usual formula for central moments in terms of moments, we get 
(1-3) Mi.fi = 2 (-lyQ Mfl*,n,-i • 

Terms of the form m«:h,i»,_< may be evaluated by use of the well known formulas 
[20; p. 58] for changing from moments to central moments in the case of a multi¬ 
variate distribution. Two of these formulas are given below. 

ihliwi = UlUiian ~ Ml0: <la |i»M01:i. 4 M k • 

MU1:*.MW1« = Mlll:» 0 M6«c — Mll0:ii,j>iii e M00l:fi a ii»ii l , 

(1.4) 

+ 2 Ml00:M,M»«.M0l0:K o ii W i«Mci0l:ii„ii W i e • 

We find that 

t! 

(I-®) M<1:mimi-i ~ 2 MfciiMlii-i i 

where pip* is a two-part partition of i and n + r 2 = 1. 

Using (1.3) and (1.5), we get 

(1.6) Mlsit = MS ~ MS:<i| • 

(1.7) Ml:i, = M* - 3AUVII4 + 6 miMS:/.i + 2mS:mi • 

(1.8) Ml:i 4 = M4 + 6(m* ~ 2/txf) - 12/njS 3:Ml + 12 miMu:h,h, 

— 4mh: M iM , + 6 mm;„ im - 3jS4;^, . 

etc. 

If the n populations are identical, it is evident from the definition of mi:Ai 
that, for all finite t, 

Ml:i, = Mi • 


2. The expected value of Thiele seminvariants. If the <th Thiele sem- 
invariant is denoted by Xi, then 

fi n\ (-lr'fKp-l)! 

u ' Mi:x, i 8i y 8i , ... «,[(2!)m(3J).. .. .'(„[)•* » 

the summation being taken for all positive integers «,(i = 1, 2, - ■ • v), for which 


P ® t t — ^ t 8 i * 

1-1 1-1 

Terms of the form m«,», .. • • •»« are evaluated by (1.4). We have 

(1.10) Ml:X, = Xs - Mi:,!, . 



FINITE POPULATIONS 


313 


(1.11) Ml:X, = X* — + 6 Xij \i%:n + . 

(1.12) Ml:X 4 = X 4 + 12[X2 ~ 2Xil fiiifti — 24Xi/I3:mi + 24Xi/Zii: MlM2 

"H 12 /Zn : MlMl — — 6/24: Ml . 

etc. 

If the n populations are identical then, for all finite t, 

Ml:X, = X<. 

3. Generalized sampling. It follows from definition that all rational isobaric 
moment-functions have the property that they may be expressed in terms of 
power sums and power product sums with certain coefficients. Of the power 
sums and power product sums which enter a sampling formula only the power 
product sums take different forms depending on the law of variate selection. 
Now, there are two possible courses which may be followed by one who wishes to 
derive sampling formulas for the case of a single population. 

1. One may decide in advance on the law which he wishes to govern the 
selection of variates which enter the sample. Then he may apply this law in 
the evaluation, in terms of moments, of every power product term as it occurs 
in each formula which is derived. 

2. One may derive the formulas for sampling under the condition that the 
law is unspecified, thereby obtaining formulas which are capable of being 
interpreted in terms of laws that are decided upon later. 

We illustrate the two possible courses by considering the formula, 

(1.13) fix, - r -2x + 

which Carver [ 12 ; p. 102 ] obtains for the case of finite sampling without replace¬ 
ments. Here r - the number in the sample, s = the number in the parent 
population and z< = the algegraic sum of the variates of ith sample. Later, 
by evaluating 2 x 2 and Sf <£,• in terms of moments, he finds 

( 1 . 14 ) lix, - Ml:. • 

8—1 

(It should be noted that Carver [12; p. 115] obtained the corresponding formula 
for infinite sampling by letting 8 —» «). 

The preceding development is entirely in accord with the first of the courses 
stated above. It is also the standard procedure and is the course followed by 
such writers as Isserles [2], Neyman [ 6 ], Church [7], Pepper [ 11 ] and Dwyer [20], 
in deriving finite sampling formulas. Also, it is the course followed by such 
authors as “Student” [ 1 ], Tchouproff [3], Church [5], Craig [9], Fisher [10], and 
Georgesque [13] for the case of sampling from an infinite population. 



314 


JOSEPH A. PIERCE 


However, in (1.13), it is possible to employ the definition, 


«(«-!) s ***-**- 

Then (1.14) becomes 

(1.15) fit:, = rfit + r(r - l)Au. 

Formula (1.15) may be interpreted as holding for either finite or infinite 
sampling, depending on the interpretation which is given to Am • It may be 

easily shown that, if the sampling is from a limited supply, Am = —At sod 

8 1 1 

(1.15) reduces to (1.14). If the sampling is from an infinite supply, Am becomes 
Ai and therefore 

At:* = ^At:* , 

which is the formula [12; p. 115] that corresponds, in the infinite case, to (1.14). 

Thus, either of the two courses is possible in the case of sampling from a single 
population. However, if one wishes to get general formulas which hold for both 
infinite and finite sampling, he should follow the second course. Similarly, in 
order to obtain generalized sampling formulas where the relations between the 
variates are unspecified and the populations arc assumed to be different, the 
second course should be followed. 

It appears that Tchouproff [3], [4] was the first to approach the sampling 
problem from such a general point of view. However, his methods of derivation 
arc quite complicated and his results, in general, are difficult to apply to a given 
problem [5], [8]. 

Samples of n are fortned from n U N by chosing one variate from each of the n 
populations. A typical sample is 


lXi\ , 2 Xi t , iXi | y * • • , r%i r y * • • , nX% n . 

We define [4; p. 472] 


(1.16) 


r £ <■»*<!, • • • r,Xil t = E( r,*}; • • • tA',) 

™ *r»*tr«* * * 


Xr \' x *t ’ ‘ %r 
rjf*rk 


rjfj* • • •<* y 


~rk r £ r ‘ r *^ ■ r ' IU ' t '" 


n 


(*) 


S v 




— i 


where k represents the number of possible terms of the given form; S, means v 
times the sum for unequal values of n , r s • • • r v and n (v) = n(n — 1) • • • 
(» - v + 1). 


4. Moments and product moments of sample moments. The tth moment of 
the jth sample is denoted by ,m ( . The sth moment of for all j is denoted by 
V.;«, where the prime indicates that the moments of the universe are measured 
about a fixed point. It follows that 



FINITE POPULATIONS 


315 


1 Ml* _ 

(1.17) - - £ rX l ir and 'n. M , * El-m t Y. 

1% r—1 

Also, the general product moment, in which the variates of both the sample 
and the universe are measured about a fixed point, is defined by 

(1.18) ~ ••• j 

As an illustration of the methods used to derive the formulas of this section, 
consider a special case of (1.18) when «i = 2 and «< = 0, (i = 2,3, • • ■ , t>). Then 



Therefore, by (1.1), (1.2) and (1.16), we get 

(1.19) '/**:», = 

Using the formulas [20; p. 34] relating products of power sums and power 
products to expand expressions of the type EQm ‘,j ,mjj • • • we give, in the 
tables below, formulas for moments and product moments of sample moments 
through weight six. The number in a cell and the coefficient, in the same 
column, at the top of the table should be taken as the coefficient of the moment 
which is found in the same vertical division. The -coefficients in the vertical 
division are coefficients of the entire right members of the formulas for the 
respective moments. 

Terms of the form if ti = U = • • • = t, — t, are sometimes written 

The numbers in the cells of the tables are identical with the numbers in the 
cells of the tables given by Dwyer [19; p. 30] for the expected value of partition 
products. 

5. Moments of central moments of samples of n. The tth central moment of 
the jth sample is denoted by ,m,. Then, 

(1.20) /Hit - - £ (rX <r - itniY 

n r-1 

and 

(1.21) =s[jZU,- i*»l)‘l . 

Ltt r-1 J 



316 


JOSEPH A. PIERCE 


TABLE I 


(1) 

(2) 


(3) 




Coef. 

Coef. 

n<‘> 

Coef. 

n 

n<*> 

n (,) 

mi 

Ml 

Mi* 


M. 

Mt.l 

Mi* 


'Mi: mi 

"Mi:m* 

nr 1 

1 

Vllma 

n~ l 

1 i 




'Mi:mi 

n~* 

1 

Ml i:miw2 

n~* 

1 

• 1 






# Ms:»i 

n~ l 

1 

3 

1 



Coef. 

n 

n«> 

n<*> 

„<*> 

n (s) 

n (4) 

»»> 









M6 

M4.1 

MS. 1 

M».l* 

Ml*.i 

Ml.l a 

Ml" 


(4 





Vi:m» 

n ”* 1 

1 







Coef. 

n 

n<*> 

n<« 

»w 

n (4) 

Mlllmim* 

n “ 2 

1 

1 







M< 

Ms. i 

M2.2 

M2> I s 

Ml 4 

Ml llmjma 

n 2 

1 


1 




V l : »«4 

*r l 

1 





VsKmima 

n ~ 3 

1 

2 

1 

1 



MlKmimj 

n“ f 

1 

1 




Vnimjtnj 

n “ 8 

1 

1 

2 


1 


Vaimj 

to 2 

1 


1 



Vsi^mims 

n “ 4 

1 

3 

4 

3 

3 

l : 

Vsilmim* 

to'* 

1 1 

_ i 

i 2 

! i 

t 1 

■ r 


V*.mi 

n~* 

1 

5 

10 

10 

15 

1° 

1 Wm, 

?r 4 

1 


3 

1 

6 

1 









FINITE POPULATIONS 


317 


After writing (^r,- r — ,mi)‘ as the sum of the general term of a binomial series 
and then expanding the resulting right member of (1.21) as a product of power 
sums [20; p. 19], we get 


(1.22) 




-E 


s! 

ri!r*! • • • r„!iri!irj! • • • 


z 


*1^*2^* 


0 

J*iv 






j** 1 !■"<!* * 


where ]C r i = 2 h r i = P and in , *■*, • • • are the numbers of the repeated 

7-1 7-1 

parts of 8 . 

The mean of the tth central moment takes the following simple form, 

(1.23) Vl:5, = g (•-1 lY Q W,-«, 


where the moments in the right member of (1.23) through weight six are given 
in the tables of section four. Also, 


(1.24) Pi:m 2 == M2:m 2 2 /X|i:m 1 m 2 4” M4:m! • 

(1.25) M8:** 2 :=: M3:m 2 3 M22:m,m 2 4"" 3 M&»»i • 

^ M2:mj “ M2:mj 4” 9 M22:m|«n 2 4“ 4 M6:mj 6 Mlll:mim s m| 

4“ 4 MSl.'mjmi 12 M4l;m,m 2 • 


After substituting from the tables of section four, (1.23) through (1.26) become 

(1.27) Vi:*. “ - Mul. 

n* 

n <» 

(1.28) Vi:», = ”;[*- 3w,i + 2/i,.]. 

n* 


Vi:*. = -><V - 3n + 3)U - 4 Wll ) + 3n <J) (2n - 3) w .* 

TC 

+ 3n C4) (2|£j,i« — /iii)]. 

W, = 4 [n <l> (n s - 2n + 2)(/t t - 5/n,i) + 10n (,) (n - 2)/n,i . 

Tfr 

+ 10n (,) (n + l)(n — 4)/u.ii - 30n (,) (n — 2)/i*»,i 
— 10n (4, (3n — 4)/ti,ii 4n^/tti]. 


(1.30) 



318 


JOSEPH A. PIERCE 


Vi:m» = ^ ~ 5n* + 10» ! - lOn + 5)(m« - mm) 

ft 

+ 15n <J V - 4n s + 7n - 5)m4.s - 10n (,) (2n* - 6n + 5)ms« 

(1.31) + 15n (, V - 4n* + 6n - 5)mm» - 60n ( *V - 4n + 5)m» A i 
+ 15n (,) (3n — 5)mji — 20n <4) (n J — 3n + 5)ms,i< 

+ 45n (4) (2n — 5)mi«,i« + 15n <w (n — 5)m*.i* — 

(1.32) 'fit-.m, = i[n <8) (n — 1)(m4 — 4/x»,i) + n w (n + 1)mi,* — tt <4) (2Mn* — mm))- 

TT 

'nr.mt — i[n (2) (n — 1) 2 (m« — 6/is,i) + 3n <2) (n — l)(n* — 2 n + 5)m.t 
w 

- 2n (2) (3n 2 - 6n + 5) MJl8 + n m (n 3 - 3n‘ + 9n - 15) w . 

(1.33) — 3n U) (n - 1 )(n - 5)m4,i« — 12n (,) (n* -4n + 5)msai 
+ 4n <4) (3» — 5)ms,i» — 3n <4) (n 2 -6n + 15)mj«,i« 

+ w (#) (3mj,i 4 — mi*)]* 

W*. = -,l»% - !)*(» - 2)On - 6 mm) - 3n <2, (n - 2) 2 (2n - 5Wi 
+ — 2)*(n 2 — 2n -f* 10 )/j 3 ,s 

(1.34) - 6n (,) (n - 2)(n 2 - 6n + 20)ms.u + 3n <3) (» - 2)(7n - 10)mm* 
+ 3» (,) (3n* - 12n + 20W + 4n (4> (n - 2)(n - 10)mm* 

+ 9n <4) (n* — 8» + 20)ms>.i> — 4» w, (3mj.i 4 — Mi*)]- 

6. The variance of the variance of samples of n. The variance of the variance 
of samples of n, when the moments of the universe are measured about a fixed 
point, is defined as 

(1.35) Ms:*, = MS: *2 [ Ml: mi] ■ 

Therefore, from (1.27) and (1.32), 

Vmj = -^[n (2) (n — 1)(m4 - 4ms, i) + n W) (n - 1)ms,» — n <4) (2Mi,u - miO] 
n * 

(1.36) 

" (“»-) <w - 

Tchouproff [4; p. 492] gave a formula (8) for the variance of the sample 
variance but his result is unwieldy due to the fact that moments of the universe 
are measured about the mean. 



FINITE POPULATIONS 


319 


7. Conventional infinite sampling formulas derived from generalized sampling 
formulas. The term “infinite sampling” is to be interpreted as meaning: 
sampling from an unlimited supply or sampling from a limited supply with repeti¬ 
tions permitted. In each of these situations the variates are independent [5; p. 79]. 

First, it is assured that the n populations are identical, that is, iX «* * • • • 

«■ nX. This assumption results in the fact that, for a fixed t, at, ■» tMi * • • • «= 
„Mt and ifit => im« * • ■ • * • Therefore, under the assumption of identical 

populations, every moment may be interpreted as either the moment of n identi¬ 
cal populations or as the moment of a single population. The only other as¬ 
sumption is that the sampling is “infinite”. 

From the condition of independence [3; p. 141], we have 


„<) - (ErSi'JiEr^) .. . (£„<). 


Therefore, 


r,r, ■ ■ ■ ■<„ — r,M«i rjMlj ••• r,Ml, • 


Combining the condition of independence with that of identical populations, we 
have 

(1.37) — (JJ Sv - S, r ,Ml, r t Mh • • • = M«i Mlj • • ' • 


By (1.16) and (1.37), we may write 

(1.38) mi,i, - i. - Mi,Mt t • Mi, ■ 

Since the only terms of the generalised sampling formulas which are affected 
by the assumption of “infinite sampling” are those of the form mi,i,--i, , the 
problem of obtaining conventional infinite sampling formulas from generalized 
sampling formulas is, in practice, a mechanical one. Simply write terms of the 
form Mhif-i, which appear in a generalized sampling formula, as Mi t Mi, • ■ • Mt, 
and one automatically obtains the corresponding infinite sampling formula. 

As an illustration of the method, consider the generalized sampling formula 
(1.36) for the variance of the sample variance. When (1.38) is utilized to change 
it into the corresponding infinite sampling formula, (1.36) becomes 

n <«> 

(1.39) 'fi, it:*, = —j [(n — 1)0 u ~ — (n — 3)j 4 + 2(2n — 3)(2ptpi — jm)], 

which is the usual formula (20; p. 75] for the variance of the sample variance 
when the moments of the universe are measured about a. fixed point. If it is 
assumed that the moments of n V N are measured about the mean, formula (1.39) 
becomes 

„<») 

(1.40) fir.mt "= —f [(» - Dm* - (» - 3)m»], 

nr 

which was published by “Student” [1; p. 3] in 1908. 



320 


JOSEPH A. PIERCE 


8. Conventional finite sampling formulas derived from generalized sampling 
formulas. The term “finite sampling” is to be interpreted as meaning: sampling 
from a limited supply when repetitions are not permitted. 

In order to reduce generalized sampling formulas to the corresponding formulas 
for finite sampling, the assumptions are made that the n populations are identical 
and that N and n are finite, N > n. The selection of variates which enter each 
sample is restricted in the following manner. If a variate having a given post¬ 
subscript is chosen, then no other variate having the same post-subscript may be 
chosen for the same sample. 

Now it is evident that terms of the form Mots••••<„ must be redefined on the 
basis of the preceding assumptions. From the expansions [20; p. 32] of power 
product sums in terms of products of power sums, we get the formulas for n h , t .. 
which are given in the following tables. 

The formulas in the tables of this section are called transformation formulas for 
finite sampling or more briefly transformation formulas. 


The transformation of generalized sampling formulas into corresponding 
finite sampling formulas is illustrated by the substitution of for jui.i 


in (1.27). We get 



which is the well-known finite sampling formula for the mean of the variance of 
samples of n. 

From this and the preceding section it is evident that the generalized sampling 
formulas may be considered as formulas for either infinite or finite sampliug 
depending upon the interpretation given to terms of the form jui,. 


9. Transformation of infinite sampling formulas into corresponding finite 
Bflmpling formulas. It is a well-known fact that infinite sampling formulas may 
be obtained from those for finite sampling by letting the size of the parent popula¬ 
tion become infinite. But, prior to this paper, apparently no one has presented a 
method of obtaining finite sampling formulas from infinite sampling formulas. 
However, by making use of the relations between finite, infinite, and generalized 
sampling, we shall demonstrate that it is possible to transform any infinite 
sampling formula into the corresponding finite sampling formula. 

Since the infinite sampling formulas are obtained from the generalized sam¬ 
pling formulas by replacing 

Phtf-t, by •••/*<„ 

it follows that generalized sampling formulas may be obtained from the infinite 




































































322 


JOSEPH A. PIERCE 


formulas by replacing 

(1-42) by Mill!‘"f,’ 

However, it must be emphasized that the application of (L42) demands formulae 
which are expressed in terms of moments of sample moments rather than central 
moments of sample moments (although the sample moments may be measured 
about a fixed point or about the mean) and the moments of the universe must be 
measured about a fixed point The reason for these restrictions is to insure that 
each term is accounted for individually. 

After replacements (1.42) are made in the formula for sampling from an 
infinite population, the resulting formula is the corresponding generalized one. 
The step to the corresponding finite sampling formulas is simply the one outlined 
in section eight, namely, the use of the transformation formulas. 

We shall consider, as the first illustration, the infinite sampling formula for 
the mean of the sample variance when the moments of the parent population are 
measured about the mean. The formula is 

(1.43) Ml:m 2 = --- M2- 


When (1.43) is expressed in terms of moments of the parent population about a 
fixed point, we have 


(1.44) 


Ml:w » 2 


ft — 1 
ft 


[/*2 - Mil- 


Following (1.42), mi is replaced by and (1.44) becomes (1.27). The use of 
the transformation formula for gives (1.41) which, when the moments of the 
parent population are measured about the mean, becomes 

(l.-US) rui, - ^ —1) <“• 


Infinite sampling formulas expressed in terms of moment-function, may be 
similarly transformed into the corresponding finite sampling formulas. For 
example, Craig [9; p. 57] gives the second Thiele seminvariant of the variance 
of samples as 


(1.46) 




(n - l) 2 


X, + 2 ' 


x; 


First, we express (1.46) in terms of moments about a fixed point by use of the 
formulas relating Thiele seminvariants and moments [9; p. 12], We also recall 
that the resulting formula should be expressed in terms of moments of sample 
moments rather than in terms of central moments of sample moments. We 
obtain 


(1.47) 


Wmj = t( n “* 1)M4 4(ft — 1 )m«MI + (ft 2 — 2ft + 3 )m2 

- 2(ft - 2)(ft - 3Wi + (» - 2)(w - 3 )miJ. 



FINITE POPULATIONS 


323 


The next step is to transform (1.47) into the corresponding generalized sampling 
formula by use of (1.42). We obtain (1.32). Since we desire to obtain the 
finite sampling formula which exactly corresponds to (1.46), it is necessary to 
transform (1.32) from the second moment of flij to the variance of fht and we get 
(1.36). Next the transformation formulas are applied to (1.36). When the mo¬ 
ments of the parent population are measured about the mean and are replaced 
by Thiele serainvariants, (1.36) becomes 


(1.48) 


Xj:m. — 


N(N - n)(n - 1) 


n»(N - 1 )*(JV 


2 )(N - 3) 

+ 2 (N s » 


[(N - l)(Nn - N - n - 1)JU 
3 Nn - 3JV + 3n + 3)x|]. 


Formula (1.48) gives the second Thiele semin variant of the variance of samples of 
n drawn from a finite parent population of N. When N —* «>, in (1.48), we 
obtain immediately (1.46). 

It is generally true that infinite sampling formulas are more easily derived than 
are the corresponding finite sampling formulas. The methods of this section 
make it possible to derive the desired sampling formulas for the infinite parent 
population and then transform these infinite sampling formulas into the corre¬ 
sponding finite sampling formulas. 


II. Moment Function Adjustments for Grouped Data 


A given distribution of discrete variates may be grouped in “k groupings of k". 
We desire to find the correction which eliminates the error made in replacing a 
given moment of the original distribution by the average of the corresponding 
moments of the k grouped-distributions. 

Formulas for the adjustments for moments of a grouped-distribution of 
discrete variates were first given (without proof) in the'Editorial of Vol. I, No. 1 
of the Annals of Mathematical Statistics. Later, more satisfactory derivations 
of adjustment formulas were given by Abemethy [24] Craig [25] and Carver [26]. 
However, it was observed by Carver [26; p. 162] that the developments of 
Abemethy and Craig are adjustments about a fixed point and that they fail to 
hold for the case of expectations of central moments if we accept the definition 


Ml:J, ~ jj, Xrf tfit i 


(t = 2, 3, ..•)• 


Here r jS< represents the tth central moment of the rth grouped-distribution. The 
formula for the true value of ni:;, was supplied by Carver [26; p. 162] but he did 
not indicate a general method which might be used for the derivation of mi*, ; 
« > 2 ). 

A distribution of discrete variates grouped in “k groupings of k" is a Special 
case of a universe of n finite populations and hence the methods and formulas 
for the expectations of population moments are applicable to oUr present 
problem. 



324 


JOSEPH A. PIERCE 


It is found that the adjustment formulas for moment-functions of grouped 
data involve central moments of a rectangular distribution. It will be con¬ 
venient for our present purposes to give a brief treatment of the moment-func¬ 
tions of a rectangular distribution. 


1. Moment-functions of a rectangular distribution. Consider the rectangular 
distribution of discrete variates, 

(2.1) h, 2 h, 3h, ■ ■ ■ , kh. 

It is readily shown that the moment generating function of (2.1), 

(2.2) G x (0) — mo + piO -f- ‘* 


may be written 
(2.3) 


GM = 


e M+ m * sinh \ khB 
k sinh \hB 


Setting the expansion of the right member of (2.3) equal to the right member of 
(2.2) and equating coefficients of like powers of 6, we obtain the following recur¬ 
sion formula for the moments of (1.1) 


(» + 1) (1) (» + l) (l> , 

/*»:* -HI- nHn-l:R T 


(2.4) 


1! 


21 


+ (-l) 


r—I 


(n + 1) 
r 1 


<*•) 


h T Mn-r+l:S + * * • — k" h" , 


where m>*:* represents the nth moment of a rectangular distribution. Formulas 
for n„: K , (n = 0,1, • • • , 10) arc given below. See Sasuly [27; p. 27]. 


M0:S = 1. 


Ml:* - *(* + 1 )h. 

Ml:* = i(* + 1)(2* + l)/l 2 = i(2 k + 1 )hni:it . 

Ms:* = \(k + 1)* kh* = kh Mi:* . 

Me* = i(3fc S + 3* - l)h 2 Mi:* . 

(^•®) Ms:* = l(2fc 2 + 2k + 1)A S Ms:* . 

M»:« = Hik* + 6 k* -3 k + l)h A Ml:* . 

M7:« = i(3fc 4 + 6fc* — A: 2 — 4fc + 2)fc 4 m»:« . 

Ms:* = tV( 5*‘ + 15ft 6 + 5A 4 - 15*® — ** + 9* — 3)/t* Mt:« . 

M»:« = l(2fk* + 6Jfe 6 + ft 4 - 8*® + fc 2 -f Oft - 3 )h s m.:* . 

Mio:* = T»r(3Jfc 8 + 12ft 7 + 8ft* - 18ft 6 - lOffe 4 + 24fc* + 2ft* - 15ft + 5)h f mi : * . 



FINITE POPULATIONS 


325 


The deviations about the mean of (2.1) are 

(2.6) -§(fc-l )h, -i(k-Z)h,...,l(k-3)h, tffc - 1)A. 

Therefore, 

(2.7) fitn+UH = 0 . 

If we denote (2.6) by x, we have 

<“» «■<*> = rag' 

The recursion formula for central moments of (2.1) is 


(2n + 1)' 


I ■* / - | •• V*"' I "*■ / | 

. j-| H$n:H I 2*2 * Mlw—J jA I 

(2 ' 9) L h r (2n + l) (r+1) . , _ k tn h u 

' 2 r (r+1)! Wn - r: * + " - 2*> • 

Formulas for fi tH - K , (n — 0, 1, •• • , 5) are given below. See [27; p. 27]. 

M0:« = 1, 

Ml:* = T$(k~ — 1)A 2 , 

M4:« = f(3fc“ — 7)A 2 M4 :r, 

(2-10) 

M«:« = xiv(3fc 4 - 18fc 2 + 3l)fe 4 Ml:*, 

MS:* = tU(5 k 6 - 55 k* + 239fc 2 - 381)A 9 mi:«, 

Mio:* = 25 Vff(3fc R - 52fc 6 + 410fc 4 - 1636fc 2 + 2555)*V* • 

From the relation which connects Thiele seminyariants and the moment 
generating function, we get, see [25; p. 57], 

\ _ A \ (k \)h \ _ n 


h 2 (2n + 1) (8) . 


2 2 3! 


( 2 . 10 ) 


Xo :k — 0, X]:a 


X2n+i:« — 0, 


( 2 . 11 ) 


X2w:« — (— 1) 


«+i B n tf n (k 2n - 1) 


ti — 1 , 2 , 3 , • • < 


whore X„ : « represents the nth Thiele seminvariant of a rectangular distribution 
of discrete variates and B n , (n = 1, 2, • • ■), the Bernoulli numbers: $, . 

In each of the cases considered in this section, corresponding formulas may be 
found for a rectangular distribution of continuous variates by setting h = m/k 
(which makes the range rrt with k subdivisions) and then letting k -+ ». 

2. Adjustments for moments. As our basic distribution we consider the set of 
discrete variates, , (i = 1, 2, • • • , N), where some of the xj \s may not be 
distinct. We assume that the given distribution is grouped in “k groupings 
of k”. 



326 


JOSEPH A. PIERCE 


ts of the class arc 
k - (2r - 1)' 


When Xi is placed in the rth position of a class, the limi 
xt — (r — l)h and x, + (k — r)h and the class mark is x< + 

k — (2r — 1)1 

----I k 


>} 


is added to the true value of x,. Therefore, when the expected value of a 
particular moment for “k groupings of k” is found, each variate has made a 
definite contribution as it was placed in each of the k positions of a class. 

For convenience, we define 


( 2 . 12 ) 



(2>— if 
2 


h, 


(r = 1,2,..., k). 


As was previously indicated, the expected value of a given moment involves 
the contribution of each variate as it occupies the k class positions. A con¬ 
venient method of finding these contributions is by means of a universe J r ,, 
which is composed of the populations , A\ (r = 1, 2, ■ • • , k). The rth population 
consists of the values of the variates when they occupy the rth position of the 
class. Hence r X consists of = a*, + c,•, {i = 1,2, • • • , N). 

The notation for moments is the same as that of Part I. Since /,(\ v is of the 
same form as the universe studied in Part I, we use the definitions (1.1) of that, 
part. 

The expected value of the fth moment is 

Mi:. ( = 53 E(xi + c r y 



Many devices have been used by previous writers [24; p. 209], [25; p. 571. 

j t 

[26; p. 157], to evaluate terms of the form - £ r*. However, it should be 

K r-1 


noticed that, the quantities c r , (i = 1, 2, • • • , k), are respectively identical 
with the deviations (2.6) about the mean of a rectangular distribution of discrete; 
variates. It follows that 


= 



e,.. 


And since m 2 „+i : * = 0, we have 

/ i \ 

(2.13) Ml:*, = £ 

Formulas for , (s - 0,1, • • • , 5) are given by (2.10), 

If the class marks are selected as the unit of we set h = 1 in (2.10). If the 




FINITE POPULATIONS 


837 


class interval is chosen as the unit of x, we set h * 1/k in (2.10). If k con¬ 
secutive values of the discrete variable are grouped in a frequency class of width 
m, we put h « m/k in (2.10). 

Usually we desire to estimate the value of the moments that would have been 
obtained if we had not grouped the data. Therefore (2.13) is solved for the 
moments of the ungrouped data. We have 

l 4A J / < \ 

(2.14) m - z [pjPu**,-,. 
wherein 

p _ y (— 1 ) P (2«)! pi __ 

* “ i(2p0!]'‘[(2pi)!] r * • •. [(2p.)!r^,! *,! • . 7 ,,!’ 

the summation being taken for every possible product of moments for which 

V 1 9 

Z Pi = *, Z = P- 

i-1 i-1 

Formulas, corresponding to (2.13) and (2.14), for a distribution of continuous 
variates are written by replacing the moment symbols for discrete variates by 
those for continuous variates. 

3. Adjustments for central moments. Consider the universe l> which consists 
of the population ,X, (r — 1,2, • • • , k), where r X is the rth grouped-distribution. 

The’ expected value of the tt,h central moment of the k grouped-distribution is 
given by (1.3), (1.4) and (1.5) of Part, I, where now m :Ml _< is given by (2.13) of 
the preceding section. Thus, the development of this section is identical with 
that of section one of Part I with the single exception that Mim, = Pt no longer 
holds but is replaced by pi :w = pi + a correction. Therefore, the formulas for 
the adjustments for central moments may be obtained immediately from the 
formulas derived in section one, Part I, if the corrections of the preceding section 
are inserted. We have 

(2.15) Pl'-ii — M2 + MJ:« - M2:*, • 

(2.16) MU*, - ik + SpiPt-.H — 3mii: m1wj + 2m3: mi . 

(2.17) Ml:*, = M 4 + 6j5»Mj;« + M4 :r + 6(|5j — 2n\ + &:«)«:,m 

+ 12miMii:„ IMj - 1 2miM3:mi - 4Muand, 

+ bfijiipiftj ~ 3 m4:m, . 

The moments of the ungrouped data can be obtained readily from formulas 

(2.15) through (2.17). 

Adjustment formulas for central moments of a distribution of continuous 
variates may be obtained from (2.13) by replacing the moment symbols for 



328 


JOSEPH A. PIERCE 


discrete variates by those for continuous variates and taking the moments about 
the mean. Also, it may be observed that adjustment formulas for central 
moments of a distribution of continuous variates may be obtained from formulas 
(1.3), (1.4) and (1.5) of Part I, provided the moment symbols are exchanged as 
indicated above and terms of the form are set equal to zero. 

4. Usual adjustments for Thiele seminvariants. The usual adjustments for 
Thiele seminvariants, for the univariate discrete population, may be developed 
directly by use of one of the fundamental properties of Thiele seminvariants. 

It is assumed (see [25; p. 55]) that k consecutive values of the discrete variable 
are grouped in a frequency class of width m. The k smaller intervals of width 
rn/k = h go to make up the class width ra, the actual points representing the k 
values of the variable being plotted at the centers of the sub-intervals. Now, 
let us suppose that each of the k consecutive boundary points of the subintervals 
is as likely to be chosen as a boundary point of the larger intervals as any other. 
Then, if x» is the class mark of the ith frequency class, for any true value, x, of 
the discrete variable included in this frequency class, we have 

Xi = x + (\ 

in which x and e. f are independent variables and e r takes on the k values (2.12) 
with equal relative frequencies 1/k. 

Since we have noted that the equally likely values which c r may take on are 
deviations about the mean of a rectangular distribution of discrete variates, we 
employ the cumulative property of Thiele seminvariants [9; p. 4] and obtain 
directly 

(2.18) A,' :x = A, :x + A,:*, (* = 1, 2, ...). 

where Xj ; * is the tth seminvariant computed from the grouped data, X, :x is the 
<th seminvariant computed from the ungrouped data and X f; « is defined by (2.11). 

Formulas corresponding to (2.18), for special values of /, are given by Craig 
[25; p. 57]. However, the present development indicates the dependence of 
adjustment formulas on central moments of a rectangular distribution and pro¬ 
vides a general formula for these adjustments which is expressed completely in 
terms of Thiele seminvariants. 


5. New adjustments for Thiele seminvariants. If we accept the definition 

1 * 

Ml:** 7 Z ffit > (t ~ 2, 3, • • •)> 

K r« 1 


then (2.18) is at best only an approximation formula. We now desire exact 
formulas for m : \ t for the case of a grouped-distribution of discrete variates. 



FINITE POPULATIONS 


329 


First (1.9) is used and terms of the form m.,«, .. are evaluated in terms 

of central moments by (1.3). Then terms of the form n\-. H are evaluated by 
(2.13) and finally the relations between moments and Thiele seminvariants are 
employed. Exact formulas for the expected values of the second, third, and 
fourth Thiele seminvariants for grouped-distributions of discrete variables are 


given 

below. 




(2.19) 

Ml:X 2 

= 

X2 + X 2 : r 

“ M2:m, • 

(2.20) 

MlsX, 

= 

Xg + 0XiM2: Mj — 3/iii:^,^ + . 

(2.21) 

Ml:X 4 

= 

X 4 + X 4 :# 

+ 12[X2 ~ 2Xj[ + Xj ; rI&.i 



+ 

24[/2n:MiM2 

“ M3:mi]Xi — 4/5ii: M iM , 



+ 


- 0m4:m, - 3m2: Ms • 


Formulas for Thiele seminvariants of ungrouped data in terms of expectations 
may be obtained from (2.19) through (2.21). 

Adjustment formulas for Thiele seminvariants of a distribution of continuous 
variates are given by Langdon and Ore [23; p. 231] and Craig [25; p. 57]. If we 
denote the <th Thiele seminvariant of a distribution of continuous variates by 


L ,, then 



(2.22) 

v\\L t — Lt + L t:n , 


where 



(2.23) 

J _o J _(-l y-'B.m” 

^2t+\:R U, Iju.h - . 2^ . 1 

t = 1 , 2 , 


Formulas (2.19) through (2.21) may be used for continuous variates by 
changing the moment symbols and setting terms of the form £*,•» •••*„:*«,n«, •••<■< 
equal to zero. 

6. Adjustment formulas applied to a numerical problem. We consider the 
arbitrary distribution given in Table III. 


TABLE III 

An Arbitrary Distribution of Discrete Variates 


V 

/ 

V 

/ 

V 

/ 

F 

1 

2 

4 

30 

7 

1 

2 + 30+1-33 

2 

8 

5 

4 

8 

1 

8 + 4+1-13 

3 

10 

6 

3 

9 

1 

10 + 3 + 1-14 








330 


JOSEPH A. PIERCE 


The three grouped distributions, when the variates are grouped in “groupings 
of three,” appear in Table IV. 


TABLE IV 

Distributions Derived from Data of Table III by Making the Three Possible Groupings of Three 



(i) 

i (2) 

(3) 

Class 

f 

Class 

/ 

Class 

/ 

1-3 

20 

0-2 

10 

-1 to 1 

2 

4-6 

37 

3-5 

44 

2-4 

48 

7-9 

3 

6-8 

i 

5 

5-7 

8 

10-12 

0 

9-11 

1 

8-10 

2 


Using the fixed point 4, moment-functions are computed for the distribution of 
Table III and for each of the distributions of Table IV. These quantities 
along with the average of each moment function appear in Table V. 

TABLE V 


Moment-Functions of the Distributions of Table III and Table IV. Averages of Moment- 
Functions of Distributions of Table IV 


Dist. 

Ml 

M2 


M4 

M2 ** ^3 

Ms ** ^3 

M4 

X4 

(i) 

9 

. 165 

69 

1125 

9819 

-17442 

238,849,317 

-50,388,966 

60 

60 

60 

60 

(60)’ 

(60)’ 

(60)’ 

(60)’ 

(2) 

-9 

171 

81 

2511 

10179 

567162 

557,840,277 

247,004,154 

60 

60 

60 

60 

(60)* 

(60) 8 

(60)’ 

(60)’ 

(3) 

-30 

162 

138 

1938 

8820 

1317600 

528,282,000 

294,904,800 

60 

60 

60 

60 

(60)* 

(60)’ 

(60)’ 

(60)’ 

1 

A vc\ 

-10 

166 

96 

1858 

9606 

622440 

441,657,198 

163,839,996 

60 

60 

60 

60 

(60)’ 

(60)’ 

.. _| 

' (60)“* 

(60)’ 

On*. 

-10 

126 

116 

1314 

7460 

I 

642400 

305,034,000 

138,079,200 

Dist. 

60 

60 

60 

60 

(60)* 

(60)’ 

’(60)’. 

(60)’ 


Table VI gives the expected values of the moment-functions as obtained by 
substituting from Table V into the formulas of sections two, three, and five. 
Also the expected values, computed from the usual formulas, are given and the 
errors which would be made, if the usual formulas were used, are indicated. 












FINITE POPULATIONS 


831 


TABLE VI 

Expected Values of Moment-Functions Computed by Formulas 


Expectations by 

Ml:*, 



Ml:*, 

Ml:*, - 
Mi;X, 

*lsX» 

^1:^4 


New Formulas 

-10 

166 

96 

1858 

9606 

622440 

441,657,198 

163,830,096 

60 

60 

60 

60 

(«0)‘ 

(60)* 

(60)* 

(60) 4 

Usual Formulas 

-10 

60 

166 

60 

96 

60 

1858 

60 

HI 

642400 

(60)* 

416,778,000 
(60) 4 

133,795,200 
(60) 4 

Error 

i 

— 

— 

— 

— 

264 

(60)* 

19960 

(60)* 

-24,879,198 
(60) 4 

-30,060,796 
(60) 4 


7. Evaluation of m S: mi . It appears at first that it is necessary to form the 
“k groupings of k” in order to evaluate the term £ 2 :*, which enters the precise 
formula for the expected value of the variance. That was the procedure fol¬ 
lowed by Carver [20; p. 101]. However, it is possible to evaluate j5, !MI from the 
ungrouped data without forming a single grouped-distribution. 

By definition, 

1 4 

M2:*, = rZ ('Ml “ Mi) 2 , 

K r-1 

where r /o is the mean of the rth grouped-distribution and m is the mean of the 
ungrouped distribution. We wish to study the terms r Mi and mi • Consider a 
set of variates x, , (i = 1,2, • • •, a), with corresponding frequencies/,-, (i = 1,2, 
• • • , s). The x’s are subject to the condition, x,- — x<_i = 1, and consequently 

some of the fs may he zero. The mean of this distribution is . 

We define 

V% = fi + /*+• + hk+i + • • • , (i = 1, 2, • • • , k) 

Then, if a grouped-distribution is formed with Xi in the tth (i = 1, 2, • • • , k) 
position of a class, the mean of this grouped-distribution is 

k 

X)x/ + L Fjd+i-i 

J-l 

. 

where e<_i = c* if a = 1 and c,-+i = ei if e,- = c k . Similarly if a grouped-distribu¬ 
tion is formed with x, in the (i + l)st position of a class, the mean is 

£*/ + H Fjii+i 

i-i 

— H —' 




332 


JOSEPH A. PIERCE 


Thus, it is evident that, given the expression for the mean of any groupod- 
distribution in which x, is in the ith position of a class, we may form the expres¬ 
sion for the mean of the grouped-distribution in which x t is in the (i + l)st 
position of a class by a cyclic permutation of the e,\s of the given expression. 

Therefore, it follows that if we call r Mi the mean of the grouped-distribution 
in which x, is in the rth (r = 1, 2 , • • • , k) position of a class, then 

k 

rMi “ Mi s ~ Yj - ' (r = 1 , 2 ,•••,&). 

If we define 

* 

N = £/ and 0 r - £ FjCr+j-i 

i“i 

then, 

Thus, it is evident that /z 2 : M i is a function of the frequencies of the variates and 
of the €i s. The fact, that the values of t he variates do not enter permits 
one to quickly calculate its value. 

Consider ju 2 : M , f° r the distribution of Table III. We find 

0 i = 33ci + 1 3c 2 4~ 1 4c 3 . 

Then, by successive cyclic permutations of the c/s, 

02 — 33c 2 4~ 13c 3 4- 14ci, 

03 = 33c 3 13ci 4“ 14 c 2 . 

Substituting the values ei = 1, c 2 — 0 , c 3 = —1 we have 0 i = 19, 02 = 1 and 
03 = —20. Therefore, 

254 

m * m (my 

which is identical with the value which was found when Table V was used. 

It follows from the preceding development that 

= bfjt 5 <t> ‘ r 

and if b\ = Ft — ■ ■ ■ — F k then is zero. 

8. Conclusion. The results of this paper include: 

1 . The derivation of general and specific formulas for the expected values of 
population moment-functions. 




FINITE POPULATIONS 


333 


2. The derivation of generalized sampling formulas under the condition that 
samples of n are formed by selecting one variate from each population. 

3. Methods for the transformation of generalized sampling formulas into the 
corresponding infinite and finite sampling formulas. 

4. A method for the transformation of infinite sampling formulas into the 
corresponding finite sampling formulas. 

5. A demonstration of the fact that adjustment formulas for moment-function 
of grouped data involve central moments of a rectangular distribution. 

(i. A general formula for the expected value of the tth moment of grouped data. 

7. New adjustment formulas for central moments of grouped data. 

8. New adjustment formulas for Thiele seminvariants of grouped data. 

9. A method for the evaluation of the term which appears in the precise 
adjustment formula for the variance. 

Many thanks are due Prof. P. S. Dwyer, to whom the writer is greatly in¬ 
debted for advice and encouragement. 

REFERENCES 

[1] “Student”, “The Probable Error of the Mean,” Biometrika , Vol. 6 (1908), pp. 1-25. 

[2] L. Iksekuk, “On the Value of a Mean as Calculated from a Sample,” Jour, of Roy. 

Slat. Soc.j Vol. 81 (1918), pp. 75-81. 

[3] A. Tchoitproff, “On the Mathematical Expectation of the Moments of Frequency 

Distribution#,” Part I, Biometrika , Vol. 12 (1918), pp. 140-169, 185-210; Part 
II, Biometrika , Vol. 13 (1920 21), pp. 283-295. 

[4] A. Tchouproff, “On the Mathematical Exjiectation of the Moments of Frequency 

Distributions in the Case of Correlated Observations,” Melron, Vol. 2 (1923), 
pp. 461-493; 646-683. 

[5] A. Church, “On the Moments of the Distributions of Squared Deviations for Samples 

of .V Drawn from an Indefinitely Large Population,” Biometrika , Vol. 17 (1925), 
pp. 79 83. 

|6| J. Neyman, “Contributions to the Theory of Small Samples Drawn from a Finite 
Population,” Biometrika, Vol. 17 (1925), pp. 472-479. 

[7] A. Church, “On the Means and Squared Deviations of Small Samples from any Popu¬ 
lation,” Biometrika, Vol. 18 (1926), pp. 321-394. 

[8j K. Pearson, “Another Historical Note on the Problem of Small Samples,” Biometrika 
Vol. 19 (1927), pp. 207-210. * 

[9] 0. O. Craki, “An Application of Thiele’s Seminvariants to the Sampling Problem,” 
Melron, Vol. 7 (1928-29), pp. 3-74. 

[101 It. A. Fisher, “Moments and Product Moments of Sampling Distributions,” Proc. 

London Math. Soc., Vol. 2 (30) (1929), pp. 199-238. 

[111 J. Pepper, “Studies in the Theory of Sampling,” Biometrika, Vol. 11 (1929), pp. 231-258 

[12] H. C. Carver, “Fundamentals of Sampling,” Annals of Math. Stat., Vol. 1 (1930), 

pp. 101-121, 260-274. 

[13] N. St. Georuehque, “Further Contributions to the Sampling Problem,” Biometrika , 

Vol. 24 (1932), pp. 65-107. 

[14] G. M. Brown, “On Sampling from Compound Populations,” Annals of Math. Stat., 

Vol. 4 (1933), pp. 288-342. 

[15] J. Neyman, “On the Two Different Aspects of the Representative Method,” Jour. 

Roy. Stat. Soc., Vol. 97 (1934), pp. 558 625. 



334 


JOSEPH A. PIERCE 


[16] A. L. Bow ley, “Measurement of the Precision Attained in Sampling/' Bvll. Interna¬ 

tional Stat , Inst., Vol. 22. 

[17] H. Feldman, “Mathematical Expectations of Product Moments of Samples Drawn 

from a Set of Infinite Populations,” Annals of Math. Stat Vol. 6 (1935), pp. 
30 52. 

[18] H. C. Carver, “Sheppard’s.Corrections for a Discrete Variable,” Annals of Math. 

Stat., Vol. 76 (1936), pp. 154-163. 

[19] P. S. Dwyer, “Moments of Any Rational, Integral Isobaric Sample Moment Func¬ 

tion,” Annals of Math. Stat ., Vol. 8 (1937), pp. 21-65. 

[20] P. 8. Dwyer, “Combined Expansions of Products of Symmetric Power Sums and of 

Sums of Symmetric Power Products with Application to Sampling,” Annals 
of Math. Stat., Vol. 9 (1938), pp. U84. 

121] A. T. Craig, “On the Mathematics of the Representative Method of Sampling,” 
Annals of Math. Stat., Vol. 10 (1939), pp. 26-34. 

[22] A. Fisher, 7'he Mathematical Theory of Probabilities, New York, The Macmillan 

Company. 1928. 

[23] W. H. Rangoon, and Ore, 0., “Seminvariants and Sheppard’s Corrections,” Annals 

of Math., Vol. 31 (1930), pp. 230 232. 

[24] J. II. Abernethy, “On the Elimination of Systematic Errors Due to Grouping,” 

Annals of Math. Stat., Vol. 4 (1933), pp. 263- 277. 

125] C. C. Craig, “Sheppard’s Corrections for a Discrete Variable,” Annals of Math. 

Slat., Vol. 7 (1936), pp. 55-61. 

126] H. C. Carver, “The Fundamental Nature and Proof of Sheppard's Adjustments,” 

Anruils of Math. Stat., Vol. 9 (1936), pp. 154-163. 

J27] M. Sasui/y, Trend Analysis of Statistics, Washington, The Brookings Institution 
(1934). 


The University of Michigan 
Ann Arbor, Michigan 



THE ANALYSIS OF VARIANCE WHEN EXPERIMENTAL ERRORS 
FOLLOW THE POISSON OR BINOMIAL LAWS 

By W. G. Cochran 

1. Introduction. The use of transformations has recently been discussed by 
several writers [1], [2], [3], [4], in applying the analysis of variance to experi¬ 
mental data where there is reason to suspect that the experimental errors are 
not normally distributed. Two types of transformations appear to be coming 
into fairly common use: y/x and sin " 1 \/x. The former is considered appro¬ 
priate where the data are small integers whose experimental errors follow the 
Poisson law, while the latter applies to fractions or percentages derived from 
the ratio of two small integers, where the experimental errors follow the binomial 
frequency distribution. In each case the object of the transformation is to put 
the data on a scale in which the experimental variance is approximately the 
same on all plots, so that all plots may be used in estimating the standard error 
of any treatment comparison. The extent to which these transformations are 
likely to succeed in so doing has been examined by Bartlett [2]. The object of 
the present paper is to discuss the theoretical basis for these transformations in 
more detail, and in particular to examine their relation to a more exact analysis. 

2. Experimental variation of the Poisson type. The first step in an exact 
statistical analysis of the results of any field experiment, is to specify in mathe¬ 
matical terms (1) how the expected values on each plot are obtained in terms of 
unknown parameters representing the treatment and block (or row and column) 
effects (2) how the observed values on the plots vary about the expected values. 
In this section, the variation is assumed to follow the Poiason law. 

The specification of the expected values requires some consideration. In the 
standard theory of the analysis of variance, treatment and block (or row and 
column) effects are assumed to be additive. In the case of a Latin square, for 
example, the expected yield m, of the ith plot, which receives the /th treatment 
and occurs in the rth row and the cth column is written 

(1) = G + T t + Rr + C c 

where G is a parameter representing the average level of yield in the experiment, 
and T t} R r and C c represent the respective effects of the treatment, row and 
column to which the plot corresponds. Since the T , R and C constants are 
required only to measure differences between different treatments, rows and 
columns, we may put 

( 2 ) E Tt - £ Rr - £ C e rn o. 

t r e 


335 



336 


W. G. COCHRAN 


If the experimental errors are normally and independently distributed with 
equal variance, this specification leads to very simple equations of estimation 
for the unknown parameters, the maximum likelihood estimate of T t , for 
example, being the difference between the mean yield of all plots receiving that 
treatment and the general mean. In addition to its simplicity, this type of 
prediction formula is fairly suitable for general use, because it gives a good 
approximation to most types of law which might be envisaged, provided that 
row and column differences are small in relation to the mean yield. However, 
in considering an exact analysis with Poisson variation, the prediction formula 
is assumed chosen, without reference to computational simplicity, as being the 
most suitable to describe the combined actions of treatment and soil effects. 

The probability of obtaining a given set of plot yields Xi with expectations w, 
may be written 

c“ m ‘ mV 

Xi\ 

Thus L, the logarithm of the likelihood, is given by 
(3) L = 2 (*» log m, - mi) - £ log a*,!. 

i i 

Hence the maximum likelihood equation of estimation for any parameter 6 
assumes the form 

v ) dwii _ q 

tfii 36 


n 

* 


where the summation extends over all plots whose expectations involve 6. The 
dfH 

function —will usually involve a number of parameters. Since the specifiea- 
30 

tion of row 7 , column and treatment effects in a 6 x 6 Latin square requires 10 
independent parameters, the solution of these equations may be expected to be 
laborious, though it may be shortened by the intelligent use of iterative methods. 
The problem of obtaining exact tests of significance is also difficult. The 
method of maximum likelihood provides estimates of the variances and co- 
variances of the treatment constants, w r hich under certain conditions can be 
assumed to be normally distributed if there is sufficient replication, but this can 
hardly be considered an exact “small sample” solution. 

These remarks show that the exact solution is somewhat too complicated for 
frequent use. The difficulty arises principally because the typical equation of 
estimation consists of a weighted sum of the deviations of the observed from the 


expected values, the weights being 


1 dmi 


1 


. The factor — w r as introduced into 
m» 36 nii 

the weight by the Poisson variation of the experimental errors, and must be 
retained in any theory which claims to apply to Poisson variation. It is, how¬ 
ever, worth considering whether some simplification cannot be introduced into 



ANALYSIS OF VARIANCE 


337 


the equations by assuming some particular form for the prediction formula. 
This line of approach seems promising when one considers the simplification 
introduced into the “normal theory” case by assuming ..the prediction formula 
to be linear. 

For Poisson variation, the linear law does not appear to be particularly suit¬ 
able, since it may give negative expectations on some plots (as happens in the 

dftl • 

numerical example considered in the next section). Further, while — 1 ■ becomes 

dd 

a constant, the factor — remains in the weight. 

M{ 

The entire weight can be made constant by assuming a linear prediction 
formula in the square roots and transforming the data to square roots. For a 
Latin square, this prediction formula is written 

(5) ‘s/rrii = = G + T t + R r + C C) 

where 

( 6 ) £ T, = Z Hr =lc f = 0 . 

t r c 

To find the maximum value of (3) subject to the restrictions (6), we may use the 
method of undetermined multipliers, maximizing 

(7) L + X(]C + m(5Z Rr) + KiC Cr). 

t r e 


The equation of estimation for a typical treatment constant T t becomes 

(8 ) «« + l = 0 lc„ i*a^j!d + x.o, 

\ rrii ) cloti dl t \ Wfix • 


the summation being extended over all plots receiving the treatment, 
a, - vV, , then by Taylor's theorem 


(«) 


/ \ dffli , 1 / \2 d ttlj 

Ti — rrii — (a, — a,) + — j (a, — a,) 


+ 


If 


If »i, is reasonably largo, only tho first term on the right-hand side need be 
retained. When m t is small, we may use, instead of the exact square root, a 
quantity a' defined so that 

( 10 ) - rrii — ( a’i - a.) ^ = 2 V m. («’ ~ <*•)■ 

don 

Thus if the analysis is performed on the quantities a[ instead of on the original 
data, equation (8) becomes 

£ 4(oj — a*) + X = 0. 

T t 


(ID 



338 


W. G. COCHRAN 


On substituting the expectations for a< from (5), and using (6), we obtain 
(12) £ 4(a' - 0 - T t ) + X - 0. 

Ti 

The corresponding equation for G is 


(13) 


£ 4(a' - (?) = 0, 


so that G is the general mean of the quantities a'. By adding equations (12) 
over all treatments, and comparing the total with (13), we find X = 0 Hence 
T t is the difference between the mean yield of o' over all plots receiving T, and 
the general mean of a'. In this scale the simplicity of the “normal theory” 
equations has apparently been recovered. Actually, the quantities a' are not 
known exactly, since 


(14) 


a' = a + 


(x — m) 
2 \/m 



where a is the expected value of y/x. However, this process provides a means 
of successively approximating the maximum likelihood solution, by choosing 
first approximations to the quantities a, constructing the a”s, solving for the 
unknown constants and hence obtaining second approximations to the expected 
values. The close relation of a' to \/j is seen by remembering one of the 
common rules for finding square roots. This consists in guessing an approxi¬ 
mate root (a), dividing x by the approximate root, and taking the mean of the 
approximate root (a) and the resulting quotient ( x/a ). 

The suitability of the linear prediction formula in square roots must be con¬ 
sidered in any example in which the above analysis is being employed. The 
law is intermediate in its effects between the linear law and the product law in 
the original data. My experience is that it is fairly satisfactory for general use, 
(cf. [2], p. 72) An exception may occur when it is desired to test the inter¬ 
action between two treatments, both of which produce large effects. In this 
case the definition chosen for absence of interaction may not coincide at all 
closely with the definition implied in using the linear law in square roots. An 
example of this case was given in a previous paper fl ]. 

In this connection it should be noted that an approximate “goodness of fit” 
test may be obtained of the validity of the assumptions made. Since the quan¬ 
tities a'i enter into the equations of estimation with weight 4, the quantity 
4 £ (a'i — aif is distributed approximately as x* with the number of degrees 

of freedom in the error term of the analysis of variance. Some idea of the 
closeness of the approximation may be gathered by considering the simplest 
case in which only the mean yield is being estimated. In this case the observed 
values x are assumed to be drawn from the same Poisson distribution, and the 
sufficient statistic for the mean G is known to be 2(x,)/n. Since, however, the 



ANALYSIS OF VARIANCE 


339 


prediction formula is here the same in square roots as in the original scale, and 
since the maximum likelihood sol ution is invariant to change of scale, the mean 
value a of a' must be exactly Vs(x)/n, as the reader may verify by working 
any particular example. Thus 24(a' — a) 2 is found to be 2 (,x — xf/x, the 
usual x 2 test for examining whether a set of values x may reasonably be assumed 
to come from the same Poisson distribution. By working out the exact distri¬ 
bution of 2(xi — xf/x in a number of cases [5], I previously expressed the 
opinion that this quantity followed the x 2 distribution sufficiently closely for 
most practical uses, even for values of the mean as low as 2. This opinion has 
since been substantiated by Sukhatme, [6] who sampled this distribution for 
m = 1, 2, 3, 4, and 5. 

A high value of x 2 means either that the prediction formula is not satisfactory 
or that the experimental errors are higher than the Poisson distribution indi¬ 
cates, or that both causes are operating. These effects can sometimes be sepa¬ 
rated by examining whether the observed yields deviate from the expected 
yields in a systematic or a random manner. If the deviation is systematic, the 
prediction formula is probably unsatisfactory. 

The type of approach used above resembles in many features the “exact.” 
analysis for the probit transformation [7]. The principal difference is that in 
the case of probits the transformation is made to suit the a priori prediction 
formula, which postulates that the probits are a linear function of the dosage, 
or of the log (dosage). Thus with probits the equations of estimation still 
involve weights in the transformed scale. These do not seriously complicate 
the analysis, since only two parameters require 4 to be estimated for a given 
poison. With, however, the much greater number of parameters usually in¬ 
volved in specifying the results of a field experiment, the attractiveness of a 
solution which does not involve 4 weighting is greatly increased. 

3. Numerical example of the square root transformation. A 5 X 5 Latin 
square experiment on the effects of different soil fumigants in controlling wire- 
worms was selected as an example. The average number of wireworms per 
plot (total of four soil samples) was just under five. Previous studies [8], [9] 
have indicated that with small numbers per sample, the distribution of numbers 
of wireworms tends to follow the Poisson law. 

The plan and yields are shown in Table I. The first two figures under the 
treatment symbols are the numbers of wireworms and their square 4 roots respec¬ 
tively, the latter being regarded as first approximations to the values a'. Two 
of the plots receiving treatment K gave no wireworms. Since these plots are 
likely to be changed most in the transition from square roots to a', better 
approximations were estimated for them before proceeding with the calculations. 
The best simple approximations appeared to be obtained from the square roots 
of the means in the original units. For the plot in the second row and second 
column, the square roots of the row, column and treatment means in the original 



340 


W. G. COCHRAN 


TABLE I 


Plan and number of wireworms per plot 


p 

0 

N 

K 

M 

Mean 

3' 

2 

5 

1 

4 


1.73 s 

1.41 

2.24 

1.00 

2.00 

1.676 s 

1.76 s 

1.45 

2.25 

1.11 

2.00 

1.714 s 

1.77 4 

1.46 

2.25 

1.10 

2.00 

1.716 4 

M 

K 

O 

N 

P 


6 

0 

6 

4 

4 


2.45 

(0.39) 

2.45 

2.00 

2.00 

1.858 

2.45 

0.32 

2.50 

2.02 

2.02 

1.862 

2.46 

0.32 

2.49 

2.02 

2.02 

1.862 

0 

M 

K 

P 

N 


4 

9 

1 

6 

5 


2.00 

3.00 

1.00 

2.45 

2.24 

2.138 

2.10 

3.09 

1.00 

2.47 

2.25 

2.182 

2.13 

3.08 

1.00 

2.46 

2.25 

2.184 

N 

P 

M 

O 

K 


17 

8 

8 

9 

0 


4.12 

2.83 

2.83 

3.00 

(0.79) 

2.714 

4.18 

2.84 

2.83 

3.00 

0.77 

2.724 

4.17 

2.84 

2.83 

3.00 

0.77 

2.722 

K 

N 

P 

M 

O 


4 

4 

2 

4 

8 


2.00 

2.00 

1.41 

2.00 

2.83 

2.048 

2.14 

2.02 

1.49 

2.04 

2.92 

2.122 

2.10 

2.03 

1.50 

2.05 

2.90 

2.116 

Mean 2.460 s 

1.926 

1.986 

2.090 

1.972 

2.087 s 

2.526 s 

1.944 

2.014 

2.128 

1.992 

2.121 s 

2.526 4 

1.946 

2.014 

2.126 

1.988 




Treatment Means 



K 

P 

0 

M 

N 


1.036 s 

2.084 

2.338 

2.456 

2.520 


1.068 s 

2.116 

2.394 

2.482 

2.544 


1.058 4 

2.118 

2.396 

2.484 

2.544 


1 Original numbers. 

2 Square roots. 2 Second approximations. 4 

Third approxima 


tions. 



ANALYSIS OF VARIANCE 


341 


units are respectively 2.000, 2.145 and 1.095, and the square root of the general 
mean is 2.227. Hence 

o' = §[2.000 + 2.145 + 1.095 - 2(2.227)] - 0.39. 

The other zero value was similarly found to give o' = 0.79. The corresponding 
estimates from the means of the square roots were considerably too low, since 
the a! values tend to be higher than the square roots. The use of “missing plot” 
technique gave very poor approximations, because it ignores the fact that the 
plots in question had zero yields. 

With the estimated values inserted, the row, column, and treatment means 
of the square roots are as shown in Table I. A second approximation to o' 
was calculated for each plot. For the plot in the first row and the first column, 
the expected yield is 

a « 1.676 + 2.460 + 2.084 - 2(2.087) = 2.046. 

Hence a' — §(2.046 + 3/2.046) = 1.76. These values constitute the third set 
of figures in Table I. Theoretically, it is advisable to readjust the row, column, 
and treatment means after each new value of a' has been obtained, in order to 
secure rapid convergence. This is rather laborious in practice, and a complete 
set of new plot values was obtained before readjusting the means. The third 
approximations obtained by this method are shown in the fourth lines in Table I 
and are correct to two decimal places. 

It is noteworthy how closely the square roots agree with the third approxi¬ 
mations on all plots except those which originally gave zero yields. The differ¬ 
ences between the second and third approximations are trivial. 

The next step is to make a x* test by means of the quantity 4S(o' — a) 2 . 
From the manner in which the values a are constructed from the o"s, it follows 
that S (a' — a) 2 is simply the error sum of squares in the conventional analysis 
of variance of the values a'. The analysis of variance of the third approxi¬ 
mations is shown in Table II. 


TABLE II 

Analysis of variance of adjusted square roots 



Degrees of freedom 

Sum of squares 

Mean square 

Rows 

4 

2.9815 


Columns 

4 

1.1190 


Treatments 

4 

7.5815 

1.8954 

Error 

12 

4.5970 

0.3831 


The value of x* is 4 X 4.597 = 18.39, with 12 degrees of freedom, which is 
just about the 10 percent level. If the hypothesis is regarded as disproved 
only when x* exceeds the 5 percent level, the treatment means may be tested 
by regarding them as approximately normally distributed with variance 








342 


W. G. COCHRAN 


1/5 X 0.25 = 0.05. It is, however, more prudent to use the actual error mean 
square as an estimate of the experimental error variance, performing the usual 
tests associated with the analysis of variance. This may be justified on the 
grounds that the calculations have produced a set of plot values a' of eq ual 
weight. On this basis the standard error of a treatment mean is \/0.3831/5 * 
0.2768. ' Treatment K reduced the number of wireworms significantly below 
all other treatments, but there is no indication of any difference between the 
other treatments. The treatment means may be reconverted to the original 
units by squaring. 


4. Experimental variation of the binomial type. In this case the yields are 
obtained by examining a constant number n units per plot and noting those 
which possess a certain attribute (e.g., plants which are diseased). Experi¬ 
mental variation is presumed to arise solely from the binomial variation of the 
observed fraction p possessing the attribute about the expected fraction P, winch 
is specified in terms of unknown parameters representing the treatment and 
soil effects. 

If Ti is the number possessing the attribute on a typical plot, so that p, = r,/» 
the likelihood function takes the form 


II 


n! 


r<!(n — r,)l 


p?Qr 


Hence the terms in the logarithm which involve the unknown parameters are 
given by 


(15) L = ]£ {r, log Pi + (n - r<) log Q.). 

% 


The equation of estimation for a typical constant 0 is 

aw s j 4 ( *- <> ‘ >8 £- 0 

where the summation is over all plots whose expectations involve 0. 

As in the Poisson case, an exact solution is laborious because of the weights 
fl $JP* 

—*. The unequal weighting may be removed by transforming to the 

•OVi 00 

variate a,- = sin 1 -\/P t , and assuming that the prediction formula is linear 
in the transformed scale. For a Latin square the prediction formula is assumed 
to be 


(17) a< - G + Ti + R t + C c 

where the tth plot receives treatment t and lies in the rth row and cth column. 
Further 

(18) 


L T t = £ R r - Z Cc - 0. 



ANALYSIS OF VARIANCE 


343 


dP _____ 

Since P< «= sin* a<, —* = 2\/P < Q < . A set of variates a[ is defined so that 
on each plot 

(19) Pi - Pi - (o! - on) - 2VP<Q< (a< - «.)• 

With these substitutions, the equation of estimation for T t , for instance, 
becomes 

(20) 2 4n(a< - «,) + X - 0 

r< 

where, as before, X is an undetermined multiplier. The remainder of the solu¬ 
tion proceeds exactly as in the Poisson case, T t being found to be the difference 
between the mean value of a[ over all plots receiving this treatment and the 

general mean of o<. A x 2 test may be made with 2 4n(oJ — <*<)*• 

< 

From (19) 

(21) a, = «, + 2y/p i Q i ^ ~ = “• + 2\> r P^Q i ~ ^ 

(22) = on + J cot at — qi cosec (2aj) 

where g< is the observed fraction which does not possess the attribute. The 
calculation of approximations to a[ thus involves finding a predicted value a< 
from the treatment and block (or row and column) means, and using equation 

(22) . Tables [10] of the values of sin”' y/P ( , a< + | cot a<, and cosec (2<*<) 
have been prepared to facilitate the computations. It should be noted that 
these tables are in degrees, whereas the above equations assume that on is 

measured in radians. In degrees, equation (20) above becomes 

£ 

(23) 
while 

. 180 " 

(24) a { = on + {J cot a< — cosec (2a<)}. 

As in the Poisson case, the appropriateness of the linearly additive law in 
equivalent angles depends on the way in which treatment and soil effects operate. 
As Bliss has shown [11], the effect of the transformation is to flatten out the 
cumulative normal frequency distribution, extending the range over which it 
can be approximated by a straight line. 

5. Numerical example of the angular transformation. The data were selected 
from a randomised blocks experiment by Carruth [12] on the control by me¬ 
chanical and insecticidal methods of damage due to com ear worm larvae. 



344 


W. Q. COCHRAN 


The control and the six types of mechanical protection were chosen for analysis, 
the “yields” being the percentages of ears unfit for sale. The numbers of ears 
varied somewhat from plot to plot, the average being 36.5, but the variations 
were fairly small and appeared to be random. It was considered that varia¬ 
tions in the weight (4 n) could be ignored in solving the equations of estimation. 

TABLE III 

Percentages of unfit ears of corn 

Treatments Blocks Means 



I 

11 

111 

IV 

V 

VI 



42.4 1 

34.3 

24.1 

39.5 

55.5 

49.1 


1 

40.6* 

35.8 

29.4 

38.9 

48.2 

44.5 

39.57* 


40.7» 

36.0 

29.4 

38.9 

48.6 

44.6 

39.70* 


23.5 

15.1 

11.8 

9.4 

31.7 

15.9 


2 

29.0 

22.9 

20.1 

17.9 

34.3 

23.5 

24.62 


29.1 

23.1 

20.3 

18.2 

34.3 

23.5 

24.75 


33.3 

33.3 

5.0 

26.3 

30.2 

28.6 


3 

35.2 

35.2 

12.9 

30.9 

33.3 

32.3 

29.97 


35.5 

35.3 

14.5 

31.0 

33.4 

32.4 

30.35 


11.4 

13.5 

2.5 

16.6 

39.4 

11.1 


4 

19.7 

21.6 

9.1 

24.0 

38.9 

19.5 

22.13 


19.8 

21.7 

10.0 

24.4 

39.9 

19.6 

22.57 


14.3 

29.0 

10.8 

21.9 

30.8 

15.0 


5 

22.2 

32.6 

19.2 

27.9 

33.7 

22.8 

26.40 


22.6 

32.7 

19.2 

28.0 

33.7 

22.9 

26.52 


8.5 

21.9 

6.2 

16.0 

13.5 

15.4 


6 

17.0 

27.9 

14.4 

23.6 

21.6 

23.1 

21.27 


17.4 

28.2 

14.5 

24.0 

22.1 

23.2 

21.57 


16.6 

19.3 

16.6 

2.1 

11.1 

11.1 


7 

24.0 

26.1 

24.0 

8.3 

19.5 

19.5 

20.23 


24.3 

26.2 

28.8 

10.9 

20.1 

19.5 

21.63 

Means 

26.81* 

28.87 

18.44 

24.50 

32.79 

26.46 

26.31 


'Percentage. * Equivalent angle. ‘Second approximation. 

The percentages of unfit ears, the equivalent angles and the second approxi¬ 
mations to a' are shown in descending order in Table III. The percentages on 



ANALYSIS OF VARIANCE 


345 


individual plots vary from 2.1 to 55.5. The second approximations were calcu¬ 
lated from the block and treatment means of the angles. For the control plot 
(treatment 1) in block I, for example, the expected value is 

39.67 + 26.81 - 26.31 * 40.07. 

Since Fisher and Yates’s tables of a + $ cot a and cosec (2a) are given for 
values of a from 45° to 90°, we take the complement of the expected value, 
which is 49.93. Interpolating mentally from the table, we find 

a + $ cot a = 74.0, cosec (2a) = 58.3. 

Thus the second approximation to the complement of the angle is 

74.0 - 0.424 X 58.3 - 49.3. 

Hence the second approximation to a' is 40.7, which agrees very closely with 
the equivalent angle. 

On the majority of the plots, the second approximation differs by only a 
trivial amount from the equivalent angle. The plots with the three lowest 
percentages (2.1, 2.5, and 5.0) have increased somewhat more, and also one or 
two other plots where the angles deviated considerably from the expected values. 
A third set of approximations was not considered necessary. 

The analysis of variance of the second approximations is given in Table IV. 


TABLE IV 



Degrees of freedom 

Sum of squares 

Mean squares 

Blocks 

5 

709.79 


Treatments 

6 

1,531.56 

255.26 

Error 

30 

982*. 67 

32.76 


Taking n as 36.5, the expected value of the error mean square is 820.7/36.5 = 
22.48. Thus x* = 982.67/22.48 =* 43.71, with 30 degrees of freedom, which is 
almost exactly at the 5 percent level. This, together with the appreciable 
amount of the variance removed by blocks, indicates that the experimental 
error probably contains some element other than binomial variation. As in the 
preceding case, it would be wise to make the usual analysis of variance tests 
with the actual error mean square. 

6. Discussion. It must be emphasised that the solutions given above apply 
to the case where the whole of the experimental error variation is of the Poisson 
or binomial type. The methods are therefore likely to be useful in practice only 
where the experimental conditions have been carefully controlled, or where the 
data are derived from such small numbers that the Poisson or binomial variation 
is much larger than any extraneous variation. The x* test is helpful in deciding 




346 


W. G. COCHRAN 


whether this assumption is justified. Further, the examples worked above 
indicate that the transformed values form very good approximations on most 
plots. It will often be sufficient to adjust only those plots which give aero or 
very small values in the Poisson case, or zero or 100 percent values in the 
binomial case. In this connection the method of adjustment given above may 
perhaps be considered as an improvement on the empirical rule given by Bartlett 
[13] of counting n out of n as (n — 1/4) out of n. 

Where extraneous variation becomes important, as is probably the normal 
case with data derived from field experiments, there seem to be no theoretical 
grounds for using the adjusted values. If we were prepared to describe accu¬ 
rately the nature of the variation other than that of the Poisson or binomial 
type, a new set of maximum likelihood equations could be developed. These 
would, however, lead to a different type of adjustment. 

The justification for the use of transformations has no direct relation to the 
Poisson or binomial laws in this case, or in cases where percentages are derived 
from the ratios of two weights or volumes, as in chemical analyses, or from an 
arbitrary observational scoring With percentages, for example, it may be 
said, without describing the experimental variation in detail, that the variance 
must vanish at zero and 100 percent and is likely to be greatest in the middle. 
The formula V — \PQ is at least a first approximation to this situation. The 
angular transformation will approximately equalize a distribution of variances 
of this type, provided that X is sufficiently small. We have, of course, returned 
to an “approximate” type of argument. It follows that the original data should 
be scrutinized carefully before deciding that a transformation is necessary and 
that any presumed opinions about the nature of the experimental variation 
should be verified as far as possible. 

7. Summary. This paper discusses the theoretical basis for the use of the 
square root and inverse sine transformations in analyzing data whose experi¬ 
mental errors follow the Poisson and binomial frequency laws respectively. 

The maximum likelihood equations of estimation are developed for each case, 
but are in general too complicated for frequent use. If, however, the expected 
yield of any plot is assumed to be an additive function of the treatment and 
soil effects in the transformed scale, a transformation can be found so that the 
equations of estimation assume the simple “normal theory” form. The trans¬ 
forms are closely related to the square roots and inverse sines respectively. 

The nature of the assumed formula for the expected values is briefly discussed, 
and a x test is developed for the combined hypotheses that the prediction 
formula is satisfactory and that the experimental errors follow the assumed law. 

Numerical examples are worked for both types of transformation. These 
indicate that even for data derived from small numbers, the square roots or 
inverse sines are good estimates of the correct transforms on almost all plots, 
except those which give zero yields in the Poisson case, or percentages near 
zero or 100 in the binomial case. 



ANALYSIS OF VARIANCE 347 

In practice, these new methods are not recommended to supplant the simple 
transformations for general use, because it can seldom be assumed that the 
whole of the experimental error variation follows the Poisson or binomial lawB. 
The more exact analysis may, however, be useful (t) for cases in which the plot 
yields are very small integers or the ratios of very small integers (it) in showing 
how to give proper weight to an occasional zero plot yield. 

REFERENCES 

[1] W. Ci. Cochran, “Some difficulties in the statistical analysis of replicated experi¬ 

ments, M Empire J. Expt. Agric ., Vol. 6 (1938), pp. 157-75. 

[2] M. 8. Bartlett, “The square root transformation in the analysis of variance,' 1 

J . Roy . Stat. Soc. Supply Vol. 3 (1936), pp. 68-78. 

[3] C. I. Buss, “The transformation of percentages for use in the analysis of variance," 

Ohio J. Sci. t Vol. 38 (1938), pp. 9-12. 

[41 A. Clark and W. H. Leonard, “The analysis of variance with special reference to 
data expressed as percentages," J. Amer. Soc. Agron ., Vol, 31 (1939), pp. 55-56. 

[5] W. G. Cochran, “The x* distribution for the Binomial and Poisson series, with small 

expectations," Ann. Eugen ., Vol. 7 (1936), pp. 207-17. 

[6] P. V. Sukhatme, “On the distribution of x* in samples of the Poisson series," J. Roy. 

Stat. Soc. Supply Vol. 5 (1938), pp. 75-9. 

[71 C. I. Bliss, “The determination of the dosage-mortality curve from small numbers," 
Quart. J . Pharmacy and Pharmacology , Vol. 11 (1938), pp. 192-216. 

[8] A. W. Jones, “Practical field methods of sampling soil for wireworms," J. Agric . Res., 

Vol. 54 (1937), pp. 123-34. 

[9] W. G, Cochran, “The information supplied by the sampling results,” Ann . App. 

Biol., Vol. 25 (1938), pp. 383-9. 

[10] R. A. Fisher and F. Yates, Statistical tables for agricultural , biological and medical 

research , Edinburgh, Oliver and Boyd, 1938. 

[11] C. I. Buss, “The analysis of field experimental data expressed in percentages," Plant 

Protection (Leningrad), 1937, pp. 67-77. 

[12] L. A, Carruth, “Experiments for the control of larvae of Heliothis Obsoleta Fa&r," 

J. Econ. Ent ., Vol. 29 (1936), pp. 205-9. 

[13] M. 8. Bartlett, “Some examples of statistical methods of research in agriculture 

and applied biology," J. Roy. Stat. Soc. Suppl Vol. 4 (1937), p. 168, footnote. 


Iowa State College, 
Ames, Iowa 



NOTES 

This section is devoted to brief research and expository articles , notes on methodology 
and other short items . 


ORTHOGONAL POLYNOMIALS APPLIED TO LEAST SQUARE FITTING 
OF WEIGHTED OBSERVATIONS 


By Bradford F. Kimball 


1, Introduction. Let the independent variable be denoted by x , and let it 
range over n consecutive integral values Xi to x n . Thus x represents the 
index-number of the ordered intervals at which observations are taken, where 
the intervals are all of equal length, and an index-number is assigned in con¬ 
secutive order to every interval within the range of investigation, whether ob¬ 
servations occur in that interval or not . Let y x denote the observation measure 
(usually referred to as observed value), if such observation exists. Let w x denote 
the weight of that observation, with weight zero assigned where observations 
are lacking. 

To shorten the notation, summation over all values of x from xi to x n will be 
denoted by the sign 2). If a subscript and superscript is used, the context will 
indicate the variable to which the summation refers. The rth binomial coeffi¬ 
cient will be denoted by 

A system of polynomials r = 0, 1, 2, 3, • • • of degree r in x is said to be 
an orthogonal system, for the purposes of this paper, if they satisfy the relations 

( 1 ) £ W,<t>r(x)<t>.(x) 

To construct the polynomials, one may write them in the form 
4>o(x) = / 0 (x) = constant 

( 2 ) ^ 

4>r(.x) = fr{x) hi<f>i{x ) T — 1, 2, 3, • • 


= 0, r ?£ 8 

0, r = 8. 



where the hi are constants and the / r (x) are arbitrary polynomials of degree r. 
It then follows from the conditions of orthogonality that 


(3) 


. = ]jW x f r (x)<t><(x) 

E w, [&(*)]* ’ 


348 



OBTHOQONAL POLYNOMIALS 


849 


Thus when the polynomials f T (x) have been chosen for all r, the system of 
orthogonal polynomials for a given set of weights can be constructed and is 
uniquely determined except for a constant factor [1]. 

By virtue of the relation (2) and the conditions of orthogonality (1), it follows 
that 

(4) Zu>„[4 r (a;)] s = 'Lw x fr{x)4> r {x). 

Define the function 4>(r, k) by 

(5) 4>(r, k ) = 2 Wifrixfaix), r = 0, 1, 2, 3, • • • . 

It follows from the relations (2) and (3) that 

( 6 ) 

*-0 t ) 

where it is to be noted that this summation is independent of x. 

Define q r and Y r by 

(7) Qr = lw x [<t>r(x)f = ZwJ r (x)<l>r(x) = 4>(r, r), 

(8) Yr = 2w, V * t > r { x ). 

Then if u r (x) represents the polynomial solution of degree r of the normal equa¬ 
tions set up for observed values y x and'weights w x , 

(9) Ur(x) = — + — <t>l(x) + 4- - 4>r(x). 

qo q i qt Qr 

If E 2 denotes the weighted sum of the squares of the discrepancies between 
the ordinates u r (x) of the fitted curve and the observed values y x , then [2], 

(10) E 2 = StOxWr) - y,f = ]C w t yl— 2 — • 

<-o qi 

The practicability of the use of orthogonal polynomials is thus seen to depend 
upon whether the quantities <t>(r, k) and Y r can be evaluated in a reasonably 
simple manner. 

The thesis of this paper is that if f r (x ) is taken as the binomial coefficient 

one can effectively apply the method of orthogonal polynomials. This is made 
possible by the use of factorial moments in conjunction with an adding machine 
that prints cumulative totals. 

In treating the same problem Aitken sets up the normal equations in terms 
of factorials, but considers the explicit use of orthogonal polynomials imprac¬ 
tical. He writes: “the arbitrary nature of the weights stands in the way of 
any analytical sophistication; orthogonal polynomials emerge, but arc not of 
great use; and the necessity of solving the moment equations cannot be circum¬ 
vented” [3]. He prefers a determinants! method of solution of the normal 




360 


BRADFORD F. KIMBALL 


equations which the writer has found to be more involved from a practical point 
of view, than the present method, although it is elegant from a theoretical 
standpoint. 

Thus although the present method is not new from the point of view of 
theory, the writer has found that forms made up by the use of the technique 
suggested below, offer an effective method for fitting polynomial curves to 
weighted observations. 


2. Simplification of the problem when f,(x) 

and M r are defined by 



Factorial moments S r 


(ID 



r = 0, 1, 2, 


These moments are not difficult to compute and are readily checked as com¬ 
puted. Formula for 4>(r, k) then becomes 


( 12 ) 


Thus since 


Again 


4>(r, k) = 2 (^j 
l, 4>(r, 0) = X w x 


w x <t> k (x). 


= S r and hence 



*< 1 . 0 ) _ , 
*( 0 , 0 ) 


Si 

So' 


«r,l>-*(;)„.(*-0-X 



w x 



= (r + l)<Sr+l + rSr — ~^ r • 


Hence 

9 , = <*>(1, 1) = 2S s + (l 

A recursion formula for 4>(r, k ) may be obtained by expanding <t>k(x) in formula 
(12) by means of (6). Thus 

(13 . 

-e(:X0 

The first term can be easily expressed as a linear combination of binomial coeffi¬ 
cients, and thus as a linear combination of moments Si. 



ORTHOGONAL POLYNOMIALS 


351 


The formula for Y, can be broken down as follows: 



F 0 = WmVm Mq | 


(14) 

Tr =» Z W *V*Mx) “ Z M, *V*(* / 

) - Z ^ ^ IZ 

‘ <-o 9< 


i-o 9< 


Thus 




F» = Aft — Fi - 

9i 

_$(2,°) y etc 
?o 


3. General technique of computation. In determining the best fitting poly¬ 
nomial of degree r, the ratios $(r, i)/9> are seen to play an important part. 
In a form for calculation, these quantities should receive simple designations 
such as bi for a second degree curve, c< for a third degree curve, etc. Suppose 
they are designated by Ri for a curve of degree r; then 

(15) <t>r(x) = / ^ Z Ri<t>i(x) 

\r / <-o 

(16) Y r = M r -t,R<Yi 

i-0 

(17) q r = Z w z - 2 RMr, i) 

and in determining 4>(r, k) for k = 0, 1, 2, • • • r — 1, formula (13) may be 
written: 

(18) $(r, k) * 

The fact that these quantities Ri appear as multipliers in so many of the 
fundamental formulas greatly simplifies the mechanics of the calculation, espe¬ 
cially when a calculating machine is used. 

In final determination of polynomial curve the differences of the polynomial 
at x = 0 are readily determined since the leading term of each orthogonal 
polynomial is a binomial coefficient and thus 

AV(0) = - Z Ri* k M 0), 

i—0 

A r *r(0) - 1. 


(19) 


k =* 1,2,3, • • •, r - 1 



362 


BRADFORD F. KIMBALL 


Since the effectiveness of the method depends upon the availability of ah 
adding machine which records a cumulative subtotal, the determination of the 
curve from the differences at the point x = 0 is not a hardship and indeed 
affords a quick and accurate means of setting up the curve for purposes of 
plotting and checking. 

UrO0) = - + -*i(0) + + ,•••,+ -*r(0), 

qo qi q 2 qr 

(20) A*w r (0) = — + Ft+1 A l ^ +1 + — A* *(<>), 

Qk Qk+l Qr 

A r U r (0) = - r . 

Qr 

The advantage of the use of orthogonal polynomials becomes particularly 
apparent when error formulae are to be used. The formula for the sum of the 
squares of the discrepancies, denoted by E 2 , is given above (formula (10)). 
The estimated variance V of the weighted observations about the fitted curve 
is thus E i /(n — r — 1) where n is the number of values of x used in fitting 
and r is the degree of the curve fitted. Recalling that the matrix of the normal 
equations is of the diagonal form with diagonal elements q v , q \, • • • , q T it 
follows that the coefficient Yk/Qk of <f>k(x) in the expansion of u r (x) has the 
variance V/q k . 

Furthermore the variance of the ordinate of the fitted curve u r (x) at a point x 
due to sampling variations in the determination of the coefficients of the curve, 
under the assumption that the weights and values of the independent variable x 
do not involve errors, has the simple form 


( 21 ) 


Variance of u r (x) 


_ , ♦?(*) 


at point x = V + 




fox) 


L Qo qi qr 

since the covariances of the orthogonal polynomials are zero [4]. 



REFERENCES 

[11 G. Szkgo, Orthogonal Polynomials, Colloquium Publications, Amer. Math. Soc., Now 
York, 1939, Vol. 23, Chapter II. 

[2] Max Sasulit, Trend Analysis of Statistics, The Brookings Institution, Washington, 

1934, pp. 296-297. 

[3] A. C. Aitken, “On Fitting Polynomials to Weighted Data by Least Squares,” Proc . 

Roy. Soc. of Edinburgh, Session 1933-34, Vol. 54, Part I, No. 1, p. 2. 

[4] Henry Schultz, “The Standard Error of a Forecast from a Curve,” Jour. Amer. Stat. 

Assn., June 1930; or Whittaker A Robinson, The Calculus of Observations, 
Blackie & Son Ltd., London, 1937, 2d ed., p. 242. 


Port Washington, N. Y. 



COMBINATORIAL FORMULAS 


353 


COMBINATORIAL FORMULAS FOR THE rth STANDARD MOMENT 
OF THE SAMPLE SUM, OF THE SAMPLE MEAN, 

AND OF THE NORMAL CURVE 

By P. S. Dwyer 

The standard moments of the normal curve are usually expressed by the two 
statements [1, p. 97] 


<**• - 


[<*S«+1 =0 J 

It is of some interest to note that these two statements may be generalized into 

(2s)! 

a single statement by observing that is the number of ways in which 2 a 

things can be grouped in pairs and that 0 is the number of ways in which 2s + 1 
things can be grouped in pairs. It is obvious that an odd number of things 
can not be grouped in pairs since there must be at least one unpaired unit. It 
is clear, too, that the number of orders in which 2s things can be grouped in 


pairs is ^^)( 2s 2 2 )^ 2s 2 4 ) • • • and this is ' However if the 

resulting paired groups (rather than the orders of grouping) are counted it is 


and this is 


However if the 


seen that each paired grouping is repeated «! times so that represents the 

number of ways 2s things can be grouped in pairs. If we arbitrarily define the 
number of ways 0 things can be grouped in pairs to be 1 (or if we limit our 
theorem to values of r > 0) we may say “The rth standard moment of the 
normal curve is equal to the number of ways in which r things can be grouped 
in pairs.” 

As presented above the combination representation is used primarily as a 
means of unification of results. However, it is possible to derive the standard 

moments of the normal curve in such a way as to indicate the term early 

in the proof and to trace it throughout the proof. I follow the method outlined 
by H. C. Carver [2] in obtaining the normal distribution as the limit of the 
distribution of sample sums (or of sample means) though I use a somewhat 

different notation [3, p. 5]. If we let (p?‘ 1 • P ;*) represent the number of 

ways in which r units can be collected with n groups containing pi units, ir» 
groups containing p* units, etc., then the multinomial theorem can be expressed 
as [3, p. 17] 





354 


P. 8. DWYER 


where the summation is taken over all possible partitions pP • • • pP of r and 
the expression (pP • • • pp) represents the power product form [3, p. 14] which 
is xi!tj! •••«•,! times the monomial symmetric function. If p represents the 
number of parts of the partition then 


while 


P * *1 + *■!+•••+ T, 


r = P\T\ + PfKt + • • • + P.T, . 


Now it can be shown from (2) in the case of infinite sampling that 



and since fa — 0, it is only necessary to sum over all partitions which have no 
unit part. We have then, dividing by [jii*:(i)] 4r = [np»] lr 

(4) S’W 


We have now a formula for the rth standard moment of the sample sum which 

is expressed essentially in combination notation since the quantity 

represents the number of ways in which r units can be grouped to form irj 
groups containing units, * 2 groups containing p 2 units, etc. All non-unitary 
groupings of r are formed, each combinatorial coefficient is computed and multi¬ 
plied by n {p) /n* r times the product of the corresponding a’s, and the sums are 
formed. It might be noted that the formula for the rth standard moment of 
the sample mean is identical with (4) while the corresponding finite sampling 
(without replacements) formula is 



(5) 


and) 


i \ N p Ppfi.-.pf* / n TI / \f, 

■ ^ \J>P • • • vp) ” ( “ Pl) ' ■' M 


The P’s are defined in previous papers [2, p. 105—6][3, p. 113]. 

Wc obtain the formula for the rth standard moment of the normal curve by 
taking the limit of (4) as n —* ». (H. C. Carver has pointed out [2, p. 121] 
that this method of derivation imposes fewer restrictions than does the deriva¬ 
tion from Hagen’s hypothesis.) Each partition term will approach zero as n 
approaches infinity if p < \r. Now the only non-unitary partition in which 
p is not less than is the partition 2* r and we can have this partition only when 
r is even. Now the limit as n approaches infinity of n w /n ir is unity and we 
have, in the limiting case 

if r is even. 

0 if r is odd. 




ON A METHOD OF 8AMPUN0 


355 


Since 



is the number of ways r units can be grouped in pairs when r is 


even and since 0 is the number of ways r units can be grouped in pairs where 
r is odd, it follows that the rth standard moment of the normal curve is the 
number of ways in which r units can be grouped in pairs. 

This development is of interest in that it makes possible the tracing of the 

value back through the various stages of the development to the coefficient 

of (2 ir ) in the power product expansion of the multinomial theorem. 


REFERENCES 

[1] H. C. Carver, “Frequency Curves,” Chapter VII of Handbook of Mathematical 

Statistics. 

[2] H. C. Carver, “Fundamentals of the theory of sampling,” Annals of Math . Stat ., 

Vol. 1 (1930), pp. 101-121. 

[3] P. S. Dwyer, “Combined Expansions of Products of Symmetric Power Sums and 

of Sums of Symmetric Power Products with Application to Sampling,” Annals 
of Math. Stat., Vol. 9 (1938), pp. 1-47, 97-132. 


University of Michigan, 
Ann Arbor, Michigan 


ON A METHOD OF SAMPLING' 


By E. G. Olds 


It is recorded that Diogenes fared forth with a lantern in his search for an 
honest man. History does not tell us how many dishonest men he encountered 
before he found the first honest one but, judging from the fact that he took his 
lantern, apparently he expected to have a long search. The general problem of 
sampling inspection, of which the above is a special case, can be stated as follows: 

Given a lot, of size m, containing s items of a specified kind. If items are 
to be drawn without replacement until i of the 8 items have been draw'll, how 
many drawings, on the average, will be necessary? 

Uspensky® has solved a problem concerning balls in an uni, from which the 
answer to the above question can be obtained for the special case i — 1. For 
the general case, the distribution for the number n of the drawing in which the 
ith specified item appears, is given by terms of the series: 


( 1 ) 


/ 

*0 


/nr 

£ 0 




= z' 

n-0 


' n — 1,*~1 > 




1 Presented to The Institute of Mathematical Statistics, Dec. 27,1938, at Detroit, Mich., 
as part of a paper, entitled “Remarks on two methods of sampling inspection.” 

1 J. V. Uspensky, Introduction to Mathematical Probability , McGraw-Hill, New York, 
1937, p. 178. 



356 


E. G. OLDS 


where the first symbol indicates the number of ways of choosing t — 1 of the 
specified items to fill the first n — 1 places, the second symbol indicates the 
number of ways of disposing of s — 1 specified items in the last to — n places, 
and the denominator gives the number of ways that the s items can be scattered 
through the lot. In order to get the average number of draws we multiply 
v'o by n and sum. Then we have 

/n\ > _ V' TlCn —1 C«—n.t— 1 t(wi + 1) \ ' C n,i _ i(TO + 1) 

K ) Vl ~h cz. r+r n—o c m+ll . +1 « + i * 

Example 1. On a table of 200 bargain shirts there are 5 which have a 15 in. 
neckband and 35 in. sleeves. How many shirts must be examined, on the 
average, to find two of the desired kind? 

Solution . For this case, m = 100, 8 - 5, i = 2. Therefore n = [2(201)] 

6 = 67. Thus, an average of 67 shirts must be examined. 

Suppose hk represents the Kth moment about the mean, v K the Kth moment 
about the origin, and v* the moment relation given by 

(3) v' K = {vi + K- l) (x> , 

where (i n + K — l) <Jt> represents the result of expanding (v + K — l) (,t> and 
changing the exponent of v to the corresponding subscript. (For example, 
v' % — ( v \ + 2) w = v 3 + 3»<s + 2vi.) It is easy to derive the recurrence relation 


(4) 


vk 


(i + K - 1 )(m + K ) 
s + K 


Vk —l • 


From this result the computation of the moments about the mean is theoretically 
direct. Actually the results do not seem to be very compact. The variance is 
given by 


(5) 


Ms = 


(m + 1 )(to — s) 

(« + D 2 (s + 2) 


l Us + 1) - f]. 


In case s is unknown and n is known for a particular value of i, we may 

estimate 8, ( or rather —?—r), by using the relation, n = --- ^^ . Then 

\ 8 + 1/ s + 1 

1 


n 


(6) TT I ert ' _ i(S+T)' 

and the variance, using this estimate, is given by 


(7) 


Variance off ] - -jest. = —^ 

\s + 1/ n + z(n< 


1 


i(m + 1) i(m + 1) 


r*_ii 

fi- n i 

U l i 

L »*+iJ 


Example 2. In order to check a box of 144 screws, screws are drawn until 
10 good screws are obtained. In a particular case only 10 drawings were neces¬ 
sary. Estimate the number of good screws in the lot. 

Solution . Here m == 144, i * 10, n * 10. The estimate for 8 is obtained 



ON A METHOD 07 SAMPLING 


357 


from ~^ est. = jg^S) * lis an< *> 418 b® expected, the conclusion 

is that all the screws are good. Furthermore the variance of the estimated 
quantity is zero. 

It is obvious that the number of draws necessary to obtain any particular 
number of specified items is correlated with the numbers of draws for lesser 
numbers of items. To investigate this, let us suppose that n,- represents the 
number of draws to obtain exactly j specified items and that z, = n,- — n,_i. 
It follows immediately from our previous results, that 

(8) B{xd = E(xd = E(x») = - ?L+i. 

« + 1 

This result could be obtained from the fact that, corresponding to any arrange¬ 
ment of the lot for which x a == a and Xb = 6, there is another arrangement 
where x a = b and xi = a, formed by moving a — b of the non-specified items 
from the first group to the second. From this fact we see, also, that 

(9) E(x\) = E(x\) = E(x\) = ••• . 

But Xi = n, and <r \, = ^ [* + 1 “ U = ds. 

Therefore, 

( 10 ) alj = <rl s = • •. = ds. 

But, from our previous formula we have 

<Xn s = d(2s — 2), (Tng = d( 3s — 6), etc. 

Since n 2 — xi + x 2 , it follows that 

O'nj ® <Txi ”1“ 2r Zl ,x J <Tx l O'x i "4" &X 2 
where r %x , Xi is the correlation between x x and x 2 . Therefore, 

(11) = -l/«. 

Also, since Xi = wa — x*, it follows that 

( 12 ) ^ ~2s~' 

Likewise, from x\ = nj — xi, we get 

(13) *n„*i — ^ • 

Finally, we obtain the three general results 


rnuxut ~ \/s~(7=1 


i+1)’ 


(14) 



368 


MAX A. WOODBUBY 


(16) 

(16) 


'»<+ 


A-t + l 

V si ’ 

- J *■(*- "'0 
■• B< * y (? 


+1)(« - t r +1) 


Example 3. The cards of a deck are turned one by one until two aces have 
appeared. The second ace appears when the 36th card is turned. How many 
more cards should one expect to have to turn to find a third ace? 

Solution. Here m = 52, s = 4, i = 2, n* = 36. 

Then ik - 2-~, 4 - and r Bl ,„ = = “IT* Als ° 

v,, = y/4d and o nj = \/&d. Since —-— = r„ 2 , x , —-—, we have 

O’a?! 


X 3 


53 _ J^ \/6/ 36 _ 106\ = 17 
5 \/6 6 \ 5 / 3 


Of course this result could have been obtained more directly by noting that 
there were two aces left among the 16 remaining cards. 


Conclusion. The results given in this note might be useful when it is neces¬ 
sary to estimate the number of items to be drawn in order to secure a desired 
number of a particular type, such as may be the case in obtaining a sample 
with previously defined characteristics. Also the note disproves such intuitive 
notions as the one that when looking for a desired record, one is most likely to 
have to search the whole pile to find it. As far as methods of sampling inspec¬ 
tion are concerned, the one implied in this note has little to recommend it. 

Carnegie Institute of Technology, 

Pittsburgh, Pa. 


RANK CORRELATION WHEN THERE ARE EQUAL VARIATES 1 

By Max A. Woodbury 
If there is given a set of number pairs 

(1) (X,, T,), (X,, Kj). •■•,(**, Y„), 

we may assign to each variate its “rank” (i.e. one more than the number of 
corresponding variates in the set greater than the given variate). In this way 
there is obtained a set of pairs of ranks 

(2) Oi, yi), Os, y*), • • •, 0*, ys). 

Presented at the fall meeting, Mich, section of the Math. Assn, of America, Nov. 18, 
1939, Kalamazoo College. 



HANK CORRELATION 


869 


If we assume that 3 * Xj and Yi ^ Yj when i 1 * j then it follows that 
each integer from 1 to N appears once and only once in the x’e and the same 
holds for the y’a. This leads at once to the formulas: 

(3a) £ Xi = £ Vi * 12 * “ N(N + l)/ 2 , 

i-1 i-1 i-1 

(3b) £*? - £y< - £»*« W + l)(2tf + l)/6. 

i-1 i-1 i-1 

When these results are substituted in the expression for the product moment 
correlation coefficient we have after simplifying [ 1 ], 

(4) p = 1 - 6 £ D 2 i/N(N 2 - 1) where D, = x { - y t . 

i-1 

If we consider the case of equal variates and follow the rule for assigning 
ranks given in the first paragraph, the resulting method is known as the bracket- 
rank method. The use of (4) in the calculation of p by this method is not 
strictly valid, because not every integer appears in the summations and so 
neither (3a) nor (3b) is true. 

The more accurate mid-rank method assigns to each of the equal variates 
the average of the ranks that would be assigned if we were to give them an 
arbitrary order. This method preserves (3a) but not (3b). In this paper p* 
indicates the value of p as calculated by (4) when the mid-rank method is used. 

In a method due to DuBois [2], the equal variates are assigned the same rank 
so as to satisfy (3b). In this case (3a) is not satisfied. 

If we assign the ranks to the equal variates in an arbitrary way, then (3a) 
and (3b) are of course satisfied and the use of (4) is valid. There are two 
disadvantages to such a method; first, the equal variates are treated differently, 
and second, the assignment of ranks is arbitrary. These difficulties are removed 
if one uses the average of the values of p corresponding to all possible ways of 
arbitrarily assigning ranks to the equal variates. Since p is linear in £ D 2 the 

i 

average value of p may be obtained from the average value of £ D 2 and the use 
of (4). 

Let us first consider the simple case of two equal variates in one of the vari¬ 
ables, say X. It is clear that there are only two possible ways of assigning 
ranks, and that if we arrange the series by the assigned x ranks, the resulting 
series differ only in the y ranks corresponding to the equal X variates. If we 
denote the two x ranks to be assigned by m and m + 1 and the y’s corresponding 
for a particular arrangement by y m and y m +i we have for the average £ D 2 the 

expression 

*-l K 

£(*- y*Y + £ (* - y,)’ 

r*-l 

4- }[("» - VmY + (*» + !- Vm+iY + (m - ym+i) 2 + (m + 1 - y*)*]. 


(5a) 



360 


MAX A. WOODBURY 


By the mid-rank method the corresponding expression is 

(5b) 2 (* ~ V*Y + (* — y*y + (w» + i — y m ) 1 + (m + $ — j/m+i)*. 

x —1 x — m +2 

The correction Ai to be added to the mid-rank ^ D* to get the average 23 Eft is, 

i i 

by subtracting (5b) from (5a) and simplifying, 

(6) As = 

To get A k in the more general case of several equal variates, we need only con¬ 
sider the difference between the average value of 2 E* and that obtained by the 

t 

mid-rank method. If there are K equal X variates we may assign the ranks 
in K\ ways, this results in K\ permutations of the y ranks for the sets arranged 
in order of their assigned x ranks. In (K — 1)! permutations y m + y corresponds 

N 

to the x rank of m + i so that the correction to the mid-rank X) $ 


(7) 


Aff — jrr. ~ [ jL* jLj 4 " l Vm+i) J L 1 ^ 4 " n Vm+j 1 

A! y-o i-o o \ * / 


= ^E E £(>» + t - 2/«+;) 2 - + ^2 ~ ~ = 


^(^r 2 -1) 
12 


It is to be noticed that the correction is positive and depends only on the number 
of equal X variates. From this it can be concluded that for more than one 
group of equal variates no matter whether X’a or F’s we can obtain the average 
2 by computing a correction for each group and then adding these correc- 

i 

tions to get the total correction to the mid-rank X] . Then as before noted 

i 

we can by (4) calculate the average p (denoted as p). 

This correction to £ Eft may be converted into a correction to Pu . That is 


if iy.iCj 

( 8 ) 


6Ajt, 

N(N* - 1) 


KdKft - 1 ) 
2N(N* - 1) ’ 


then 


P — Pm — £ 5jv 


.*< . 


where the summation extends over all groups of equal variates, and if, is the 
number of equal variates in the ith group. 

A table of d N * for different values of N and K is given, and also a table of 
A*. The values A* are given in the top row of the table, while the are 
given in the rows below. 




RANK CORRELATION 


361 


Table of Ax ond fax 


\ K 

»\ 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

Ajc 

■ 

0.5000 

2.000 

5 

10 

17.5 

28 

42 

60 

82.5 

no 

143 

182 

&NK 













3 

1250 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

4 

0500 

2000 


— 

— 

— 

— 

— 

— 

— 

— 

— 

5 

0250 

1000 

2500 

— 

— 

— 

— 

— 

— 

— 

— 

— ■ 

6 

0143 

0571 

1429 

2857 

— 

_ 

_ 

.— 

_ 

_ 

.— 

— 

7 

0089 

0357 

0893 

1786 

3125 

— 

— 

— 

— 

— 

— 

— 

8 

0060 

0238 

0595 

1190 

2083 

3333 

— 


— 

— 


— 

9 

0042 

0166 

0417 

0833 

1458 

2333 

3500 

— 


— 

— 

— 

10 

0030 

0121 

0303 

0606 

1061 

1697 

2546 

3636 

— 

— 

— 

— 

11 

0023 

0091 

0227 

0455 

0795 

1273 

1909 

2727 

3750 

_ 


_ 

12 

0017 

0070 

0175 

0350 

0612 

0979 

1469 

2098 

2885 

3846 


— 

13 

0014 

0055 

0137 

0275 

0480 

0769 

1154 

1648 

2266 

3022 

3929 

—- 

14 

0011 

0044 

0110 

0220 

0385 

0615 

0923 

1319 

1813 

2418 

3143 

4000 

15 

0009 

0036 

0089 

0179 

0313 

0500 

0750 

1071 

1473 

1964 

2554 

3250 

16 

0007 

0029 

0074 

0147 

0257 

0412 

0618 

0882 

1213 

1618 

2103 

2676 

17 

0006 

0025 

0061 

0123 

0214 

0343 

0515 

0735 

1011 

1348 

1752 

2230 

18 

0(X)5 

0021 

0052 

0103 

0181 

0289 

0433 

0619 

0851 

1135 

1476 

1878 

19 

0004 

0018 

0044 

0088 

0154 

0246 

0368 

0526 

0724 

0965 

1254 

1596 

20 

0004 

0015 

0038 

0075 

0132 

0211 

0316 

0451 

0620 

0827 

1075 

1368 

21 

0003 

0013 

0032 

0065 

0114 

0182 

0273 

0390 

0536 

0714 

0929 

1182 

22 

0003 

0011 

0028 

0056 

0099 

0158 

0237 

0339 

0466 

0621 

0807 

1028 

23 

0002 

0010 

0025 

0049 

0086 

0138 

0208 

0296 

0408 

0543 

0708 

0899 

24 

0002 

0009 

0022 

0043 

0076 

0122 

0183 

0261 

0359 

0478 

0622 

0791 

25 

0002 

0008 

0019 

0038 

0067 

0108 

0162 

0231 

0317 

0423 

0550 

0700 

26 

0002 

0007 

0017 

0034 

0060 

0096 

0144 

0205 

0282 

0376 

0489 

0622 

27 

0002 

0006 

0015 

0031 

0053 

0085 

0128 

0183 

0252 

0336 

0437 

0556 

28 

0001 

0005 

0014 

0027 

0048 

0077 

0115 

0164 

0226 

0301 

0391 

0498 

29 

0001 

0005 

0012 

0025 

0043 

0069 

0103 

0148 

0203 

0271 

0352 

0448 

30 

0001 

0004 

0011 

0022 

0039 

0062 

0093 

0133 

0184 

0245 

0318 

0405 

35 

0001 

0003 

0007 

0014 

0025 

0039 

0059 

0084 

0116 

0154 

0200 

0255 

40 

0000 

0002 

0005 

0009 

0016 

0026 

0039 

0056 

0077 

0103 

0134 

0171 

45 

0000 

0001 

0003 

0007 

0012 

0018 

0028 

0040 

0054 

0072 

0094 

0120 

50 

0000 

0001 

0002 

0004 

0007 

0011 

0016 

0023 

0032 

0043 

0055 

0070 

60 

0000 

0001 

0001 

0003 

0005 

0008 

0012 

0017 

0023 

0031 

0040 

0051 

70 

0000 

0000 

0001 

0002 

0003 

0005 

0007 

0010 

0014 

0019 

0025 

0032 

80 

0000 

0000 

0001 

0001 

0002 

0003 

0005 

0007 

0010 

0013 

0017 

0021 

90 

0000 

0000 

0000 

0001 

0001 

0002 

0003 

0005 

0007 

0009 

0012 

0015 

100 

0000 

0000 

0000 

0000 

0001 

0002 

0003 

0004 

0005 

0007 

0009 

0011 




362 


MAX A. WOODBUItY 


As an example of the use of the table we will consider the following problem, 
[2, p. 56], with the ranks assigned as for the mid-rank method. 


Subject I II 


A 1 2.5 

B 4 10 

C 4 2.5 

D 4 5 

E 4 7 

F 4 2.5 

G 7 8 

H 8 2.5 

I 9.5 6 

J 9.5 12 

K 11 11 

Ii 13 13 

M 13 9 

N 13 14 


We know that p 


For the mid-rank method we have 

14 

£ D\ = 119.5, N m 14, 

i-l 


Pm = 1 — 

6(119.5) 
14(196 -'1) 

= 0.7374. 

Referring to the table we find that 

Kt 

AKi 

&NKi 

2 

0.5 

0.0011 

3 

2.0 

0.0044 

4 

5.0 

0.0110 

5 

10.0 

0.0220 

Total 

17.5 

0.0385 


= 0.6989 and in terms of d NKi 


_ 6(119.5 + 17.5) 

14(196 - 1) ' 

p = 0.7374 - 0.0385 = 0.6989 


The value given by DuBois for his method is 0.7511. 


Conclusion. A method has been developed for the treatment of rank correla¬ 
tion where there are groups of equal variates. The method consists of applying 
a generally small correction to the value as ordinarily calculated by the mid¬ 
rank method in order to find the value which would be obtained by averaging 
the values of the rank correlation coefficient for all possible ways of arbitrarily 
assigning ranks to the equal variates. Thanks are due Professor P. S. Dwyer, 
without whose aid and encouragement this paper would not have been written. 

REFERENCES 

[1] Yule and Kendall, Introduction to the Theory of Statistics, p. 248. London, 1937. 

[2] P. DtrBois, "Formulas and Tables for Rank Correlation,” Psychol. Rec., Vol. 3(1939), 

pp. 46-56, 

University or Michigan, 

Ann Arbob, Michigan 





REPETITIVE 0 CCTJKBENCE8 


363 


NOTE ON THEORETICAL AND OBSERVED DISTRIBUTIONS OF 
REPETITIVE OCCURRENCES 

By P. 8. Olmstbad 

1. A simple problem of repetitive occurrences. Two questions which the 
engineer often desires to answer whenever he has a new type of apparatus or a 
new design of an old type of apparatus are: How many times will it perform 
its intended function without failure? and How many times will it fail to perform 
its intended function in a given length of time? To do this, he selects a number 
of what he believes to be identical units of the apparatus and gives each unit a 
performance test under a uniform test procedure. The number of satisfactory 
operations prior to the first observed failure to perform this operation is called 
a “run” and is a measure of the type desired for each unit. 

If it is assumed that the probability of failure at any operation is a constant, q, 
and the probability of satisfactory operation is 1 — q or p, then the mathe¬ 
matical probability of runs of 0, 1, 2, 3 • • • satisfactory operations for any 
unit are 

(1) 9. P9. P% v\ ■ • ■ 

respectively. 

Let x denote the number of satisfactory operations in any run. The mean 
value of x, say to,, is given by 

( 2 ) to , = V . 


The variance of x is 



The first step in practice is to determine whether there exists a constant 
probability, p, by means of the application of the operation of statistical con¬ 
trol. 1 Expressions (1), (2), and (3) provide the necessary information for doing 
this. When a constant probability exists as evidenced by at least 26 consecu¬ 
tive samples of 4 units each the following practical procedure has been found 
to be satisfactory. 

1. An estimate of p (or q), the sole parameter of the distribution, can be 
obtained from the average length of run in the sample. If p is less than 0.6 
and if the sample size is large, a reasonably good estimate of p can be obtained 
from the proportion of the sample having runs of zero length. 

2. The probability of getting runs of length x or more is p*. Thus, if a 
minimum (or maximum) value of the probability, p*, is chosen, a maximum 

1 W. A. Shewhart, "Statistical Method from the Viewpoint of Quality Control," The De¬ 
partment of Agriculture Graduate School, Washington, 1930, Chapter I. 



364 


P. 8. 0LM8TEAD 


(or minimum) expected length of run can be computed for use as a criterion 
for looking for assignable causes of variation in the* length of individual runs 
by using the estimated value of p. 

3. The average and standard deviation to be used in calculating the limits 
to be applied to successive samples of rational sub-groups in accordance with 
the Shewhart* Criterion I are given by Equations (2) and (3) in which the 
estimates of p and q are substituted. 

2. Application to a signal transmission problem. The theoretical solution 
given above is a direct answer to the first question at the head of this note. 

TABLE I 

Observed distributions of runs of x occurrences of event E for various test periods of 

apparatus life 


No. of 
Occurrences 
per Period 

Freq. 





Test Period 





i 

2 

3 

4 

5 

6 

7 

8 

11 

15 

X 












0 

Uq 

878 

1519 

961 

723 

541 

407 

343 

266 

160 

77 

1 

n i 

77 

226 

207 

206 

171 

148 

129 

97 

70 

35 

2 

7h 

2 

31 

44 

55 

68 

46 

52 

39 

37 

27 

3 

ft 3 

1 

3 

8 

18 

15 

19 

13 

22 

19 

10 

4 

tt4 


2 

1 

2 

— 

6 

5 

5 

7 

3 

5 

n h 



— 

1 

1 

3 

1 

1 

5 

2 

6 

n* 



1 



1 


— 

1 

2 

7 

n 7 








1 

— 

— 

8 

n 8 









2 

1 

Sample 


• 










Size 

n 

958 

1781 

1222 

1005 

796 

630 

543 

431 

301 

157 


The second question is also of interest particularly when failure to perform an 
operation does not impair the apparatus unit for performance of additional 
operations. In cases of this type, the engineer often lets his test continue for 
test periods of particular lengths, measured in numbers of operations or some¬ 
times in intervals of time (i.e., time intervals are often considered to be propor¬ 
tional to numbers of operations) and observes the number of failures during the 
test period for each unit. Thus, he may, after he has assured himself that 
control exists, arrange his data for each test period to show the frequency of 
occurrence of 0,1, 2, 3, • • • failures per unit. 

Data of this type which are typical of those found in other studies made 


•Loc. cit. 




REPETITIVE OCCURRENCES 


365 


during the past two years are presented in Table I. These were obtained in a 
signal transmission study in which the data for successive periods were obtained 


TABLE II 

Companion of observed and theoretical values of averages and variances for 

distributions of Table I 


Statistic or 


Test Period 

Parameter 


1 

2 

3 

4 

6 

6 

7 

1 

8 

n 

IS 

5 

observed 

.916 

i 

.853 

.786 

.719 

.679 

.646 

: 

.632 

.617 

.532 

.491 

$ 

observed 

.098 

.171 

.269 

.381 

.448 

.543 

.537 

.633 

.917 

1.026 

V 

m x - $ 

theoretical* 

.091 

.172 

.272 

.390 

.471 

.548 

.583 

.620 

.881 

1.039 

observed 

.091 

.200 

.343 

.497 

.556 

.832 

.760 

1.075 

1.783 

1.921 

3 V 

7 * =* ~ 

theoretical* 

.098 

.202 

.345 

i 

i 

.542 

.693 

.848 

.924 

1.005 

1.6582.117 


* Based on assumption that Q is the true value of q. 


TABLE III 

Theoretical distributions corresponding to distributions of Table I calculated by 


using q = — as the true value of q 
n 


No. of 
Occurrences 
per Period 

Freq. 

l 

2 

3 

4 

Test Period 

S 6 




is 

X 












0 

n 0 * 

878.0 

1519.0 

961.0 

723.0 

541.0 

40*0 

343.0 

266.0 

160.0 

77.0 

1 

ni 

73.3 

233.5 

205.3 

202.8 

173.3 

144.1 

126.4 

101.9 

74.9 

39.2 

2 

n t 

! 6.1 

32.9 

43.8 

i 56.9 

55.5 

51.0 

46.6 

39.0 

35.1 

20.0 

3 

ng 

.5 

4.8 

9.4 

16.0 

17.8 

18.0 

17.1 

14.9 

16.5 

10.2 

4 

n 4 

.1 

.7 

2.0 

4.5 

5.7 

6.4 

6.3 

5.7| 

7.7 

5.2 

5 

ni 


.1 

.4 

1.3 

1.8 

2.3 

2.3 

2.2 

3.6 

2.6 

6 

n* 



.1 

.4 

.6 

.8 

.9 

.8 

1.7 

1.4 

7 

n 7 




.1 

.2 

.3 

.3 

.3 

.8 

.7 

8 

ng 





.1 

.1 

.1 

.1 

.4 

.3 

9 or over 

n»_ H 








.1 

• 3| 

.4 

Sample 












Size 

n* 

958 

1781 

1222 

1005 

796 

630 

543! 

431 

301 

157 


* The observed values of no and n form the basis for the calculated distributions. 


for separate units. Since each set of these data passed the scrutiny for control, 
there is justification for assuming that a statistical universe exists and that its 
functional form may be derived from the observed distribution. It was found 






366 


P. 6. OLMSTEAD 


that these data were consistent with the assumption that, where the probability 
of non-occurrence of a failure on a unit in the test period was q, the probability 
of exactly x failures on a unit was p x q. This set of mathematical probabilities 
is shown in (1) with q redefined to apply in this case to non-occurrence of a 
failure. 

Observed and “Theoretical” values of the averages and variances for the 
observed distributions are shown in Table II. The basis for calculating the 
theoretical values was to take the ratio (designated q) of no to n for each distri¬ 
bution as the estimate of the true value, q. Distributions as shown in Table III 

TABLE IV 

Test of jit of theoretical to observed distributions (Table III and Table /, respectively) 


Tost Period 



1 

2 

3 

.. 

4 

5 

6 

7 ! 

8 

11 

15 


2.24 

0.20 

0.32 

2.09 

9.79 

0.65 

3.20| 

6.27 

1.07 

3.98 

Degrees of 
Freedom 

1 

2 

2 

i 

3 

3 

3 

3 

3 

4 

4 

P,' 

.13 

.90 

.87 

.55 

.02 

.87 

• 36 l 

.10 

.90 

.41 


‘Minimum number in cell for theoretical distribution taken as 5. 


were calculated from each q. These distributions were tested against the ob 
served distributions by means of the x 2 test with the results shown in Table IV, 
which are all within reasonable limits of what might be expected when a con¬ 
stant probability exists. 

3. Conclusions. When a constant probability applies to each operation in a 
repetitive process this note shows how to establish criteria for identifying signifi¬ 
cantly long or short lengths for individual runs and significantly high or low 
average lengths for groups of several runs. A problem taken from the field of 
signal transmission gives assurance of the existence of this type of distribution 
in practice. 

Burn. Telephone Laboratories, 

New York, N. Y. 




THE DISTRIBUTION THEORY OF RUNS 

By A. M. Mood 

1. Introduction. In studying a particular sample, the order in which the 
elements of the sample were drawn is frequently available to the statistician. 
This important information is usually entirely neglected by him. Such dis¬ 
regard must be attributed, to a considerable extent, to the unsatisfactory state 
of mathematical devices for using the knowledge in question. One reasonable 
mathematical method for handling this information, the one to be used in this 
paper, is to make use of the distribution of runs. A run is defined as a succession 
of similar events preceeded and succeeded by different events; the number of 
elements in a run will be referred to as its length. 

The distribution theory of runs has had a stormy career. The theory seems 
to have been started toward the end of the nineteenth century rather than in the 
days of Laplace when there was so much interest in games of chance. In 1897 
Karl Pearson [1], in a discussion of data taken from the roulette tables at Monte 
Carlo, wrote “.. . the theory of runs is a very simple one.” In this book he 
developed no theory but it is evident from his computations that he regarded the 
distribution of runs as a special case of the multinomial distribution. The 
multinomial method, besides evading the issue somewhat and raising questions 
of random sampling, also gives incorrect results when one is interested in runs 
of more than one kind of element. In 1899 Karl Marbe [2] derived an expression 
for the mean of the number of iterations of a given length from a binomial 
population. This result was incorrect because he neglected dependence between 
overlapping iterations. An iteration is defined as a sequence of similar events; a 
run of length t is counted as t — s + 1 iterations of length $ for s < t. Marbe 
has assembled a great mass of data with the object of proving the popular 
hypothesis that a “head” becomes highly probable after a long succession of 
“tails” has appeared. Ordinary significance tests applied to his data do not 
support this contention, but Marbe continues to advocate it [3] and [5]. Of 
course, he has been severely criticised by many mathematical statisticians. 

In 1904 Griinbaum [6] derived the mean of the number of runs of given length 
from a binomial population by the multinomial method. The first correct 
formulae were derived in 1906 by Bruns [7] who found the mean and variance of 
the number of iterations of given length in samples from a binomial population. 
In a book published in 1917 von Bortkiewicz correctly derived for the first time 
the mean and variance of runs from a binomial population using a method similar 
to that of Bruns. This book [8] contains a great many formulae for means and 
variances of runs and iterations under various special circumstances; a large 
portion of it is devoted to an exhaustive criticism of Marbe's work. In 1921 von 

367 



368 


A. M. MOOD 


Mises [9] showed that the number of long runs of given length was approximately 
distributed according to the Poisson law for large samples. 

It was not until 1925 (so far as the author has been able to ascertain) that an 
actual distribution function appeared when Ising [10] gave the number of ways of 
obtaining a given total number of runs (without regard to length) from arrange¬ 
ments of two kinds of elements. Stevens [12] in 1939 published the same dis¬ 
tribution and described a x* criterion for significance. Wald and Wolfowitz [13] 
in 1940 published the same distribution and showed that it was asymptotically 
normal. These papers are all concerned with random arrangements of a fixed 
number of elements of each of two kinds; the last mentioned paper describes a 
very interesting application of the distribution to the problem of testing the 
hypothesis that two samples have come from the same continuous distribution. 
Wishart and Hirshfeld [11] in 1936 derived the distribution of the total number of 
runs (again without regard to length) in samples from a binomial population and 
showed it was asymptotically normal. 

In this paper we shall derive distributions of runs of given length both from 
random arrangements of fixed numbers of elements of two or more kinds, and 
from binomial and multinomial populations. Also we shall give the limiting 
form of these distributions as the sample size increases. These limiting dis¬ 
tributions are all normal. The distribution problem is, of course, a combina¬ 
torial one, and the whole development depends on some identities in combinatory 
analysis,—some new and some well known to students of partition theory. 

The paper will be divided into two parts. The first will deal with distribu¬ 
tions obtained from random arrangements of a fixed number of each kind of 
element. The second will deal with distributions of elements from a binomial 
or multinomial population. 


Past I 

2. Distribution of rims of two kinds of elements. Consider random arrange¬ 
ments of n elements of two kinds, for example nj o’s and b’a with n\ + n* = n. 
Let ri< denote the number of runs of o’s of length i, and let r 2 < denote the number 
of runs of b’a of length i. For example the arrangement 

abb ab a a abb a a a 

will be characterized by the numbers rn = 2, ru = 2, r 2J = 1, r 22 = 2, and all 
other = 0. Also we let n = £ r u and r 2 = £ r a denote the total number of 

i i 

runs of o’s and b’a respectively. Throughout the paper a binomial coefficient 
will be denoted by 

»*> (:) - ra 


and this is defined to be zero when m < k. A multinomial coefficient will often 
be denoted by 



DISTRIBUTION THEORY OF RUNS 


369 


[:]- 


ml 

wtilmjl • • • m, 1 


(2.3) 2m< = m, m< > 0 

and when such a coefficient is to be summed over the indices m< the two condi¬ 
tions (2.3) are always understood and will not be repeated; other conditions on 
the indices will be placed below the summation sign. 

Given a set of numbers r,-,- (t = 1, 2; j = 1, 2, • • •, »,) such that ]£ jr< } - = n<, 


there are 


-’•‘landM 

LrwJ L r *jJ 


different arrangements of the runs of o’s and 6’s respec¬ 


tively. Hence the total number of ways of obtaining the set r,-,- is 


N(r it ) 




F(n, r,) 


where F(n , r*) is the number of ways of arranging n objects of one kind and r* 
objects of another so that no two adjacent objects are of the same kind. Thus 


F(n , r*) = 0 if I n — r* | > 1, 

= 1 if | ri - r% | - 1, 

= 2 if n — r s 


Since there are possible arrangements of the a’s and 6’s, we have at once the 
distribution of the r (/ 

(o Pin,) — L r uJL r »J _ 


Certain marginal distributions will also be of interest. To obtain, for example, 
the distribution of the ri,, it is first necessary to sum r * over all partitions of 

L r */J 

n*. This is easily accomplished by finding the coefficient of i" 1 in 


(* + z s + x* + • • .) r * = z r *(l + x + x 2 + • • .) r 


(1 - x) r * 




The term-corresponding to / = n* — r* gives the desired result: 


z [;■!-C"*:, 1 ). 

L ra /J V* V 



370 


A. M. MOOD 


We have then 

( 2 . 8 ) 


P(ru, »*a) = 


(:) 

and summing this over r 2 , a slight simplification gives 

[;]("•:') 


(2.9) 


P(r w ) 


(:) ' 

The distribution (2.6) summed over n ; - and r 2 , gives by means of (2.7) 

(:) 

which is essentially the distribution derived by Wald and Wolfowitz [13], and 
summing this over r 2 we get the distribution discussed by Stevens [12] 


( 2 . 10 ) 


Pin , n) 


( 2 . 11 ) 


Pin) = 


_fcO(\ +I ) 


(:) 


Another marginal distribution which will be useful is obtained by summing 
(2.9) over r u for i > k. If we let 

sii-rij, j < k, 

ru k— 1 

su = 5jr w , A = 'E.jru, 

k 1 

we must then sum the multinomial coefficient 


sifc! 

ru! • • • rim! 

over all partitions of ni — A such that every part is greater than k 


is given 

by the coefficient of x ni A 

in 



(x k + x k+1 + .. .)“* 

- x k, » E (*“ ” 1 t *) x 1 



t -0 \ — 1 / 

thus we 

have 



(2.12) 


V *u! / 

^ r lt ! • • • r lB1 " \ 

f n\ — A — {k — l)«i* — l\ 

< 8U - 1 / 


1. This 



DISTRIBUTION THEOBT OF RUNS 


371 


where £<*) denotes summation over all positive integers n*, n *+i, • • • , n,, 

such that £jrij = ni — A. This identity with (2.9) gives 
* 


(2.13) P(« u ) = 


[:]("*:')( 


rii — A — (k — l)su — 

Si* — 1 


t = 1,2, 


Another useful distribution analogous to (2.13) is derived by considering runs 
of both kinds of elements. If we define «*, (j = 1, 2, • • • , h ) and B in terms of 
r t j just as 8 h and A were defined above, it follows at once from (2.6) and (2.12) 
that 


(2.14) P(s u , s„) = 

silUVm — A - (k — l)s u - l\ — B — (h — 1)8» — 1 

«1<J L®2> J \ - 1 / \ 8th- 1 

(:) 

i = 1, 2, ••• , k;j = 1, 2, ... , h 

These last two distributions should be the most useful for applications. The 
long runs have been added together to form the new variables Si* and s** thus 
decreasing materially the number of variables as compared with (2.6) and (2.9) 
while at the same time little information is lost. One is free to choose k and h 
so that the number of variables is appropriate for the data at hand. Moreover, 
it is shown in Section 5 that these variables are asymptotically normally distrib¬ 
uted so that one may apply a simple x test of significance for “randomness of 
elements with respect to order” when dealing with large samples. We shall 
then be able to test whether a sample has been “randomly” drawn in a certain 
sense. 



3. Moments for runs of two kinds of elements. Instead of dealing with the 
ordinary moments we shall obtain formulae for the factorial moments because 
the expressions are much more compact. As is customary, a factorial will be 
denoted by 

(3.1) x ia) = x(x — 1)(£ — 2) • * • (x — a + 1), 

and x i0) is defined to be 1. Of course the ordinary moments are determined by 
the factorial moments by means of relations of the type 

x a = 22 C*x {i) - 

i-0 

A recent discussion of the coefficients C\ has been given by Joseph [14]. The 
mathematical expectation of a function /(r) will be denoted by 



372 


A. M. MOOD 


W - Z/(r)P(r). 


Of course E is a linear operator. We shall require the following identity 

T“* TT (o<) TriT (T aJ ) A * 1 iC ~ 


(3.3) n *'[;:] 

where 2<i> denotes summation over all positive integers ru, r M , • •. , ri», such 
that 5*) iru = n t . (3.3) may be verified by differentiating 

i 

<p(U) = (<ix + fcr 2 + • • • ) r * 

a,- times with respect to U (i = 1,2,..-, ni), then finding the coefficient of 
x* 1 after putting U = 1. The identity (3.3) enables us to find the factorial 
moments of the variables in the distribution (2.9) for we have 

B ( (i rI “) - £ n ri;" [;;](”’+ ')/ (”) 


- («2 + D <Zoi) 


\ ri - £ Oj - 

fn - X) (* + 1 )ai 

\ «i - 2 ia * 


The sum on n involved in the last step is given by the identity 




which is readily obtained by equating coefficients of x c in 


(1 + x) 




(i + xY 


We shall give here the means, variances and covariances obtained from (3.4) 
(3.6) E(r u ) - (na + l) (2) ni <) /n (<+1) , 

- nt\n» + l) (!) n{‘ +y) n s 5 (na + 1) W 


,«+l) «0+l) 


nf(na + l) < 2 > n{ 2 V(na+l) ( 2 , n{ <) 




+ 1 )®«F» 



DISTRIBUTION THEORY 07 RUNS 


373 


These will be needed in the section dealing with asymptotic distributions. The 
moments for the distribution (2.6) follow at once from (3.3) as 


(3.9) 


ii r it r a 


«i - 2 idi - 1\ /n* - 2 A ” 
Ji ~ 2 “ 1 / \ r * “ E b i - 



The summation on r 2 is accomplished by putting r 2 = r 2 — 1, r x , and r x + 1, 
but after that has been done it is necessary to expand the product of the two 
factorial factors in factorial powers of the lower index of one of the binomial 
coefficients. This is easily done for the first few moments, but there appears 
to be no simple expression for the general case. The means, variances and 
covariances of r u are given by (3.6), (3.7) and (3.8) and those of r 2> - are obtained 
from these equations by interchanging n x and nt . The other covariances are 


0 >li r 2! 

(3.10) 


_ («+» „ 0+2* „ M+l) „ 0+1) 

ni n 2 A rii n 2 

n «+;+2) ' 4 n «+»+1) 


+ 2 


„«)_ O) 

7l\ Tlj 


(«i + !)”(«, + l) (a ni l) n 2 ,> 


A slight variation of the method above will give the moments of the 6u in 
the distribution (2.13). An accent on a summation sign will indicate that the 
term corresponding to i = k is to be omitted. Differentiating 

= [t\X + fer 2 + • • • + th-\X h 1 + tk(x k + x M + • • • )Y l 


a,i times with respect to U and finding the coefficient of x ni after putting U = 1, 
we obtain 


(3.11) 


e n«^ 

'Uu-A 1 


pi"! An - A - (k- l)* u - 
UJ \ *i* — 1 


0 


{X-<) /«i -T l ia< + a k - l\ 

1 v «i-r«.-i / 


This with (2.13) gives by the same steps as used in obtaining (3.4) 


(3.12) e(u .!!•■) - (».+ !>“■■’(” *)/Q- 

The first two moments are 

(3.13) 3W <*+»?!", 


(3.14) 


nUn* + l)ni‘ +t) _ n,(rh + 1)W 

n «+jfc-K) n (i+i>n (*) 




374 


A. M. MOOD 


(3.15) 


(nj + l) <w ni tt) . (nj + l)n| 


f _ (n, + 1 )«n 

V n<*> /• 


The others are, of course, given by (3.6), (3.7) and (3.8). 

The joint moments of the variables in (2.14) as obtained from (3.11) are 


(3.16) 


e or .s-’C’) - r + “* ') 

a ««.*» \ ~Ot — 1 / 


In 2 - 2 A + 6* - 1 

\ - S'~ 1 

In addition to the covariances (3.10) we shall need 


F(si, 8 S ) 


<r »u*x ~ 


(t+2) 0+D , «>_(*+«_ O+l) «,(*+!> „0> . „«)„(;') 

Wi n 2 t“ ^2 . 0 rii w 2 T ^1 ?&2 




+ 1) (2) (w 2 + 1 ) (2) n{ fc) n 2 J) 

n (*) n (;4-I) » 


(3.18) 


<*+d„(*+ n („ i I (*)„(*) 

i n 2 . c% n i n 2 (n 1 + l)(n 2 + l)ni n 2 


__ Wi n 2 r 


fl(k+h— 1) 


«(*)«(*) 


The moments of r in the distribution (2.11) may be derived easily by means 
of (3.5) as 

(3.i» wtf*>-o-+»“ (;.::)/(;.)■ 


From which 


(3.20) 


£(r0 = 


(«2 + Dm 


(3.21) 


2 (W2 4~ lYV 

Vti mi l2> 


4. Distribution and moments of runs of k kinds of elements. This section 
is a generalization of the proceeding two sections to several kinds of elements. 
The case k = 2 was treated separately because the special character of the 
function F(n , r*) in this instance made the distribution comparatively simple. 
Now we shall be interested in k kinds of elements denoted by Oi, • • • , o* and 
we shall suppose there are n, elements of the rth kind. We let r„- denote the 
number of runs of elements of the ith kind of length j, and put 

k n i 

n-J^rii, r< = 23 r a • 

i j-i 

The same argument as was used in deriving (2.6) gives 



DISTRIBUTION THEORY OF BUNS 


376 


(4.1) 



where the function F(ri, r %, ■ ■ ■ , r%), which will be referred to hereafter simply 
as F(r t ), represents the number of different arrangements of ri objects of one 
kind, Tt objects of a second kind, and so forth, such that no two adjacent objects 
are of the same kind. We shall be able to give the explicit expression for F(r<) 
after examining the marginal distribution P{r t ). This is obtained by s ummin g 
(4.1) over r, with r„ fixed by means of the identity (2.7) giving 



Despite our present meager knowledge of F(r<) it is possible to find the 

moments of the r< as distributed by (4.2). Since 2 P( r *) — 1, we have the 

>■< 

identity 

(, 3 , En(;;:;) f <,) = ["]. 

From this the moments are easily derived. If we put 
(4.4) Ui — rii — r< 

we have 

e n n (";:;) m = e n <». - *>••" n (”;:') m 

= e n («. - i) M n ("• ■ * ~ ') m 
= n (». - i) w inf;!'') 

- ri <n, - 1 )“<’ r* - 2 *1. 

L n » ~ J 


The summation involved in the last step is given by (4.3). On dividing the 
last equation by we get the factorial moments of the Ui 

(4. 5 ) *(n **)-%<» [V-Vj/tO- 


From these equations the moments of the r< may be found; the means, variances 
and covariances are 



376 


A. M. MOOD 


n%{n — m + 1) 


_ n< 2) n< 2) 
aii nn™ 9 


n< 2) (n — n< + 1) ( 


It is clear that 


ip(U) = Coefficient of IJ zV in 
i 


(zi + • • • + Xk) k n (*! + ... + ^*-1 + Ux% + Xi +1 + • • • + z h ) ni 1 j ^ J 

is a generating function for the moments of the variables Ui . This generating 
function will enable us to find the exact expression for F(r») for we have 

k 

P(ui = tin) = Coefficient of JI t”** in <p(ti) 

- & ']/[:]• 

Xi »i/-ny-a* 

Also 

pw -?C;-0 F(r V[n”] 

and equating the expressions on the right of the last two equations we have 


(4.10) 


e Plnpv 1 ] 

TPf~\ — L a »J L n 'i J 

n (::0 

- ,s„ mv] 


Xin’d-rj-ai 

in which the prime on the n'a indicates that the indices corresponding to j = i 
are to be omitted; hence i takes all the values 1,2 , ■ ■ • ,k and j takes all values 
1, 2, • • • , k except i because the index n,-» has been cancelled with n< — r< in 
the binomial coefficient in the denominator of (4.10). It is clear from (4.11) 
that F(u) may be expressed as follows 

F(r t ) = CT IJ %i Ti {x\ + • • ■ + Xk) k (.xt + Xt + • • • + a?*)* 1 "" 1 
(4.12) i 

(*! + **+ • • • + **) r * 1 • • • (*1 + • • • + ®*-l) r * * 


in which “ CT” is an abbreviation for “constant term of.” 



DISTRIBUTION THEORY OF RUNS 


377 


We are now in a position to obtain moments of the variables in the distribu¬ 
tion (4.1) by means of identities similiar to (4.3). As an illustration we compute 




•CT II Xi r% XJ (xi + • • • + Uxi + • • • + Xk) 


- CT II xlT'i.x i + • • • + Xk) n ~*(xt + ••• + XkY 


Jl<-0 


[:] 


(n — nO 
n<* ) 


(«> 


or 


(4.13) L ° _ J) A (^ _ J) rw> - <£ 

1 

The moments of r t , may be computed from identities of this type together with 
(3.3). The first two moments are 

(4.14) E{r it ) = (n - n { + l) (1> nS y) /n w+0 

(4.15) B(rff = n| w (n - n<) <!) (n - n< + l) (1, /n w+,) 

(4.16) EiUjUt) = n,- y+l) (n — n,) <s> (n — n,- + l) ( * ) /n w+<+l) 

E(rnr. t ) = (m -j -!)(».-<-1) ”^ 7 ” {(n,- -j + l) (,) (n. - < +l) (s) 


+ 2(n - n, - n.)(n< - j + l)(n, - t + l)(n, - t + n, - j) 

+ (n — n,- — n,) m [(n, — t + 1)® + 2 (n< — j + 1) (n, — f + 1) 
+ (nt - j + 1) (,) ] + 2(n - ni - n,) (,) (n< - j + n, - t + 2) 

+ (»-*- O®} + 2(n,- - j - 1) " J + D 


(4.17) 


•(». - f + l) l!> + (n - n< — n,)[2(n< - j + l)(n, - t + 1) 

+ (»« — < + l) l!) ] + (n — n< — n t ) l,) [2(n, — t + 1) + (»,• — j + 1)] 

+ (»-»<- „.)<*>} + 2 (n. - f - 1) n 7~ 1 l W lr ' - * + 1) 

• (»< — j + 1)® + (n — n< — n,)[2(n< — j + l)(n, — < + 1) 

+ (»< — i + 1) (J> ] + (n — n< — n t ) <s) [2(n,' — j + 1) 4- (n, — t + 1)] 

+ (» — ni — «,)®} + 4 — rfJZT ) — I( n< — 3 + 1) ( n * ~ t + 1) 

+ («-«<- n,)(n< - j + n, - t + 2) + (n - n { - »,)®}. 



378 


A* M. MOOD 


Such a lengthy expression as this last one can hardly be useful to the statistician, 
and for this reason we shall not define variables analogous to the 8u and 
of Section 2 and take the time and space to find their moments. 


5. Asymptotic distributions. We shall show that some of the distributions 
obtained previously are asymptotically normal when the n t - become large in 
such a way that the ratios n*/n remain fixed. The description *‘asymptotically 
normal” means that the distribution approaches the normal distribution uni¬ 
formly over any finite region as n t —* ». The ratios Ui/n will be denoted by 
Ci , hence S €i = 1. The symbol 0(1/n°) will represent any function such that 

Lim n°0 (- ) = L < ». 
n-*ae \n°/ 

We shall not, of course, be able to get any limit theorems for distributions 
like (2.6) or (2.9) because the number of independent variables increases with 
n. We shall consider first the distribution (2.13) whose asymptotic character 
is given in the following theorem. 

Theorem 1. The variables 


(5.1) 


Xi 


Xk 


% 2 

8 u — TIC 162 
y/h 

fll* ““ W6162 
y/n 


i < k 


are asymptotically normally distributed with zero means and variances and co- 
variances 


an = e\ +r ~ Y e\[(i + l)(j + l)eiet - ije t - 2ci], i, j < k, i j 
< 7 a — e\ l [(i -+• — 2ei] ~f* ele*, i < k 

aik = e \ +t— ^[(i + 1)^6162 — iket — ej, i < k 
(rick — — l)e* — ei] + eic 2 . 


The limiting means, variances and covariances are obtained from the relations 
(3.6), (3.7), (3.8), (3.13), (3.14) and (3.15). 

To demonstrate this theorem we make the substitutions 


( 53 ) 


ni — net 

8u — m\el + y/nxi 
«u = ne\e t + y/nxk 

k 

*1 = neiei + Vn X 


i - 1,2 
i = 1, 2, 


.. ,k - 1 


k-l 


A — n(ei — e\ — ke\e *) + \/n ]£ 

i 

in (2.13), and estimate the factorials by means of Stirling’s formula 



DISTRIBUTION THEORY OP RUNS 


379 


(5.4) 


ml — ■>/2rm m+i e~ m + 0 Q^. 


The result is an unwieldy expression which we shall not present at the moment. 
First we note that the exponential factors cancel out because the sum of the 
lower indices of a binomial or multinomial coefficient is equal to the upper index. 
Also we simplify the expression by considering in detail only terms which involve 
the Xi ; the normalizing constant can be determined from the final limit function. 
Any function of the parameters will be represented by the letter K. Thus in 
(5.4) we need consider only the factor m" +i , All factorials will be of the form 


(5.5) 


m = no + y/nL(x ) + b 


where L(x) is a linear function of the X {, and a and b are independent of n and 
Xi . Now 

m m+i „= (no + V»L(x) + 6)“+vs«.>-»+* 

1 -[- _i_ £ j 

' a y/n ' an) 

and log m m+i = K + y/nL(x ) log no + (no +\y/nL(,x) + b + £) 

.,<*(, + *«+A) 

\ a y/n an) 

= K + \/nL(x) log na+ (na + y/nL(x) + b + $) 

(m + L-*lV> + o(jl\\ 

\<* y/n ' an o*n \n w )) 
= K + y/nL(x)(l + log no) + i L\x) + 0 

so terms arising from b (and b + $ in the exponent) will be neglected asjihey 
give rise only to terms independent of the x< or of order 1/n*. Of course log 
(1 + 0(l/m)) = 0(1 /to). Thus, keeping significant terms only, the result of the 
substitutions (5.3) and (5.4) in (2.13) after taking logarithms and using|(5.6) is 


-log P(r<) = K + Vn T, ^ (log neUl + D + 2 r^r* 

i i 2eie 2 

- V« (log ne\ + 1) + ^ x <) 



380 


A. M. MOOD 


(5.7) + y/n ix» + (fc — l)x*^ (log ne\ + 1) — Mfc + (k — 1)®*^ 

+ 2 Vns* (log neUi + D + -jjr - Vn (2 W) (log n^ +1 + 1) 

ejes \ i / 

+ (? “•) + 0 (ts)- 

The coefficients of x<(i < k) and x* are 

\/n(log n«}e 2 + 1 — log ne\ — 1 + i log ne{ + » — i log ne? +1 — i) = 0, 
\/n( — log ne\ — 1 + k log ne\ + A; + 2 log ne*e» + 2 — k log ne\ +l — k) «* 0. 
Hence only the quadratic terms remain and we have 


(5.8) 

where 


(5.9) 


-log P = K + i'Z„ i 

'**<+■ °$i) 

1 , ijet 

* ■ t+i 

i, j < K 

e s e x 


1,1 , t' 2 e* 

2 ' i 2 ' k+l 

i < k, 

<✓2 &1&2 #1 


1^1 + i{k — 1)62 

2 ■ h+ 1 

e% ei 

i < k, 


kk 


— 4- _L 

-5 ' ~F~ + T+i 

e; e J e i e’ 


(k - 1)’ 


e\ 


It is merely a matter of straightforward multiplication of the two matrices to 
verify that || a' 1 1| is the inverse of || «r< # ||, hence is a positive definite matrix. 
The details of the verification will be omitted. We have then 

(5.10) P - Ke~ hU<ix<xl (l + 0 


In this equation K must necessarily contain the factor 



because there 


are k + 5 factorials in the denominator and 5 in the numerator of (2.13). 
Since Ar,- = 1, this factor, in view of (5.1), may be replaced by HAx<, so 


(5.11) 


P - Ke~ i2t<u<xi nAxi 



If we restrict the x,- to any finite region R in the x-space, the function 0(1/n*) 
approaches zero uniformly as n —*• «o. Thus, if A{ < Bi are any positive 



DISTRIBUTION THEORY OF RUNS 


381 


numbers such that the corresponding values of Xi , say a< and b t , obtained by 
substituting Ai and J3, for r< in (5.1), determine a rectangular region < Xi < 
bi), which lies in R we have 

Z PW - E Re* 1 '"***' nAx< (i+o( 4-)) 

. r i- A < \ \y/n// 

(5.12) 

Jr* 

by the definition of a definite integral and Riemann’s fundamental theorem. 

We have given some details of this proof in order that it may serve as a model 
for other theorems of a similiar nature which will appear later, and for which 
a complete proof will not be given. Two immediate consequences of Theorem 1 
will now be stated as corollaries. 

Corollary 1. The variable 


r - ne 1 e t 

S3 — - ..... j ii i- . . 

y/ne i «2 


where r is the total number of runs of one kind of element , is asymptotically normally 
distributed with zero mean and unit variance. The limiting mean and variance 
were computed from (3.20) and (3.21). 

Corollary 2. The variable Q = 2a tJ XiXj is asymptotically distributed accord¬ 
ing to the x 2 -faw with k degrees of freedom. 

In exactly the same manner in which Theorem 1 was deduced from (2.13), 
we may prove the following theorem corresponding to the distribution (2.14). 

Theorem 2. The variables 


(5.13) 


Xi = 


Su — ne\(\ 


Vn 


su - ne 1^2 
Xu =- t =—> 


y/n 


Vi = 


$ 2 % 


ne\el 


\/n 


i < k, 


i < h, 


are asymptotically normally distributed with ] zero means and variances and 
covariances 


<r X(Xi = e\ +J ^[(t + l)(j + l)eie» - ije 2 - 2ci] i, j < k, 

<r X{X{ - eV~ l e\[(i + - i\ - 2eJ + e[e\ i < k, 

& X(Xlt — *6j[(t -(- l)Ajfii(?2 — ikei — ex] t K. k t 

<tx k z k = c\ k l ej[— lze\ — ej + ejea, 



382 


A. M. MOOD 


(5.14) <r WiVi = ei + *~ l e\[(i + 1)0* + — ije i — 2e%] i } j < h , 

ff viVi 888 ^2* x cJ[(i + 1) 2 ^i^ 2 ~' i 2 6i — 2 ^ 2 ] 4“ i < h y 

vxw = ei +l «* +1 [(t + 1)0 + l)etf2 - 2 i 62 - 2/6i + 46i6 2 + 2] 

i < kyj < h, 

a *kVi = c i +lfi 2[fc(i + l)^i c 2 ~ 2(fc — l)e 2 — 0 l) e i + 2 ei 62 ] j < A. 

These limiting variances were computed from the variances and covariances 
given in Section 3. We have chosen the variable su of (2.14) as the dependent 
variable. The proof of this theorem is omitted. From it the following corol¬ 
laries are deduced immediately. 

Corollary 3. If a* = x, and Uk+i = J/, of (5.13) and || a 3 1| (i, j = 1, 2, 
• • • , k + h — 1) denotes the inverse of (5.14), then the variable Q = is 

asymptotically distributed according to the x 2 -law with k 4* h — 1 degrees of freedom . 

Corollary 4. // s, = + $ 2 » denotes the total number of runs of both kinds of 

elements of length i, and Sk the total number of runs of length greater than k — 1, 
then the variables 


Xi = 8 i n ( € i e * 4 ~ £ 2 ^ 1 ) { < k 


(5.15) 


Xk 


y/n 

8k — n(ei6 2 4" e 2 ei) 

Vn 


are asymptotically normally distributed with zero means and variances and 
covariances 


(5.16) cf%j — <TxiXj 4" 4" VtjVi 4“ a ViVi . 

We have put h = k in Theorem 2 to obtain this result. The terms on the right 
of (5.16) are defined by (5.14); terms which do not appear there may be found 
by interchanging Ci and e 2 in one of the relations. For example cr VJkVJk is given by 
interchanging e x and e 2 in the fourth equation of the set (5.14). 

Corollary 5. The variable Q = 2<r lJ XiXj where the Xi are defined by (5.15) 
and || <t x} || is the inverse of (5.16), is asymptotically distributed according to the 
X~law with k degrees of freedom . 

Corollary 6. If s denotes the total number of runs of both kinds of elementSy 
then the variable 


z * * ~~ 2neie * 

2y/neie% 

is asymptotically normally distributed with zero mean and unit variance . This is 
the result derived by Wald and Wolfowitz [13]. 


6. Asymptotic distributions for k kinds of elements. We now investigate the 
asymptotic character of the distribution (4.2) 



DISTRIBUTION THEORY OF RUNS 


383 


P(rd - 


nfc:.V> 


where r* is the total number of runs of the ith kind of element. 

Theorem 1. If k > 2, the variables 

(6.2) x, - ^ 

v» 

are asymptotically normally distributed with zero means and variances and 


covariances 


<ru = e\{\ — e<)* 


The restriction k> 2 is made because in the case k = 2 the correlation between 
the two variables approaches one, and the numbers <r,-, are all equal. The result 
may be called a degenerate normal distribution and might be included in the 
theorem in this sense; we have chosen to omit it because this case is better taken 
care of by Corollary 1 of the previous section. 

The proof of this theorem will be simplified if in the moments (4.5) we replace 
the numbers n, — 1 by n,. This substitution will not, of course, affect the 
limiting moments. Hence we consider the variables v,- with moments given by 


(^ t „<«<> r»- Sa *i 

n «4“‘>) - — xq-vj 

i / Vn 

Ln<_ 


and shall show that 


are asymptotically normally distributed with zero means and variances and co- 
variance (6.3). It is possible to prove this statement by showing that the 
characteristic function (Fourier transform) obtained by substituting t'0< for U 
in the moment generating function 

k 

<pn(ti) — Coef. of II in 


II (*!+...+ X<_1 + UXt + X<+1 + • • • + X*) B< /[:] 


approaches 





384 


A. M. MOOD 


as n —> oo. This method is not appropriate for proving a similiar theorem 
which appears in Part II, and we prefer to give here a demonstration that will 
suffice for both theorems. 

In order to prove our theorem we consider the general term in the coefficient 
of Hr," 4 in (6.6) 

( 6 . 7 ) «-n[;]nf“/[:] 

in which 

k 

(6.8) Y ™u = n, 


must be required as well as the usual restriction on indices of a multinomial 

k 

coefficient, Y m u — »< • Therefore only (k — l) 2 of the indices are independent. 

i-1 

Clearly mu = v <. Now without concerning ourselves about the statistical 
significance of the variables mu , let us consider their distribution 


(6.9) 




in which the variables corresponding to the values i, j = 1,2 , ■ ■ ■ , k — 1 will 
be chosen as the independent ones. We shall now prove a theorem from 
which Theorem 1 follows immediately. 

Theobem 2. The variables 


( 6 . 10 ) 


_ mu — ne { ej 
\/n 


i,j = 1, 2, • • •, k — 1 


are asymptotically normally distributed with zero means and variances and co- 
variances given by 

Vij'PQ = > 

( 6 . 11 ) = ““ ®»(1 Gi)GjGp ) 

(Uj.u = e,€,( 1 - e,)(l - e,). 

First it is to be noted that the moments of the m,-,- are easily obtained from the 
identity 


( 6 . 12 ) 
as follows 


sfoH:] 

£ n n [ *1 - £ n •»’ n P* “ ? “*1 

a i L^'/J i i |_ mu — an J 


n-Y.au 

» tty — 




DISTRIBUTION THEORY OF RUNS 


385 


and on dividing this last relation by j j we obtain 
(6.13) tffll mW 0 ) - II n\ lia{i) II n?* ai,) /n” itai,) 

i.i I i 

from which the moments (6.11) and the means in (6.10) were computed. 

The proof of the theorem is similar to that of Theorem 1 in Section 5. We 
make the substitutions 


ma * 


Jb—1 

n< = ne,, m ki - n, - £ *»<» > 

<-i 


«,• 


£ mi>, wu* - 2n* + JL ~ n, 
i-i i.i- i 

mu = ne,e,' + y/nx ih 


in (6.9) and employ Stirling’s formula exactly as before. The details are too 
similiar to warrant repetition. The final result is 

(6.14) Ditmi) = n dx ti (l + 0 . 

Where || <r' ,,Pt || is the inverse of (6.11) and is defined by 


cr ii,pq 



o,v = 1 ■ 1 


1 


= -j H-1- 

6k 6%€k 6j€k 


+ 


e.e, 


*i,iv _ 1 i 1 ii.vi _ _1 _I 1 

a -r -j, a -r ■ 

e\e k e k eifit e* 

Theorem 1 is a corollary of Theorem 2. Also we may state these additional 
results: 

Corollary 1. If k (> 3) kinds of elements are arranged at random and r 
denotes the total number of runs of all kinds of elements, then the variable 

_ _ r - n(l - 2e?) 
yn 

is asymptotically normally distributed with zero mean and variance 

a - 2c? - 2SeJ + (Se?)* 

where e< is the proportion of elements of the i-th kind. 

Corollary 2. The variable Q — 2<r‘ J x,z } - where the i,- are defined by (6.2) 
and || o’ 0 || is the inverse of (6.3), is asymptotically distributed according to the 
X-law with k degrees of freedom. 

As was mentioned in Section 4, we could define variables «<* (t = 1, 2, • • • , k 
and j = 1, 2, • • • , hi , the hi being a set of k arbitrary integers) with a distribu¬ 
tion similiar to (2.14). If one worked through the details he would find, no 



386 


A. M. MOOD 


doubt, that these variables are asymptotically normal. The matrix of vari¬ 
ances and covariances is so complicated, however, that such a theorem would 
hardly be useful to the statistician, and the author does not feel that it would 
be worthwhile to go through the long and tedious details merely for the sake of 
completeness. 


Part II 

Instead of having the number of elements of each kind fixed, we now suppose 
that they are randomly drawn from a binomial or multinomial population. The 
numbers thus become random variables subject only to the restriction that 
= n, the sample number. The development will be entirely analogous to 
that of Part I, and the same notation will be used. The probability associated 
with the ith kind of element will be denoted by p t . 

7. Distributions and moments. The major part of the derivation of the 
various distribution functions has already been done in Sections 2 and 3. With 
the distributions of these sections we need only employ the fundamental 
relation 

(7.1) P(X, Y ) = P t (X | Y)P t (Y) 

in order to obtain the distributions required here. X will represent the set of 
variables r <} - or , and Y the variables n<. For the binomial population 
P,(Y) will be 

(7.2) p{n uni ) = (”)?“* pc. 

Therefore we may write down at once the distributions 

(7.3) Pin,, »,.) - ['■ ]MF(r,, r,)p," 

(7.4) Pin,, «,) . [£]("■ + Pi', 

(7.5) Pin, »,) - " j) (”■ + ') P r' 

(7.o) Pi.,,, n,) - [;;](”■ - 4 - f_- ^ - *)(' + ‘)rfW, 

(B (h 1)$2A l\ jp/ \ nl 

'( a»-l )nn,»)p,v,, 

*- 1 , ... ,k,j = 1 , ••• ,h, 


(7.7) 



DISTRIBUTION THEORY OF RUNS 


387 


corresponding to the distributions (2.6), (2.9), (2.11), (2.13) and (2.14) respec¬ 
tively. Of course there is some dependence among the arguments. In (7.4), 
for example, n\ is determined by hiru = n i, and n% by n — U\ = n %. In the 
last three distributions one of the is independent and one may sum these 
with respect to ni from zero to n and obtain the distributions of the r *s alone. 
The results of such summations are quite cumbersome and in some cases can 
only be indicated, so we shall retain the rn as relevant variables. This remark 
applies also to the multinomial distribution. 

We shall obtain expressions for the joint moments of the variables in these 
distributions. It is clear that the moments in Section 3 will be of considerable 
aid; for, using the notation of (7.1), we have 

(7.8) E(f(X)g(Y)) = T,f(X)g(Y)P(X, Y) = Z [£/(*)W/50] 

xr r x 

and the sum in the bracket on the right has been computed in Section 3. It re¬ 
mains only for us to multiply the previous moments by g(Y)Pt(Y) and sum on 
Y. Corresponding to (3.4), (3.12), (3.9) and (3.19) we have 

(7.9) eU [ a> fi r[ a A = z «}■’(«, + d ( io<) ( n ~ sio< 7 2a< Wpr, 

\ i / «i-o \ m — / 

(7.10) n «!?’) - t »!•’(», + 1) ,M (” ' ** " zV Wr>,", 

\ 1 / »i-o \ Tii — 2ta< / 

(7.11) E(n!->r!>') - ± »!•'(», + 1)»'(” - 

n i-0 V*1 — 0/ 

Z ni a) 8i Za(> s? bi) ( ni - 2™* + - A 

!.•!.»! \ Si — X'Oi — 1 / 

\ $* — 2 6 / — 1 / 

for moments from (7.4), (7.6), (7.5) and (7.7) respectively. In order to perform 
the summations indicated in these last relations it is necessary to expand the 
factors multiplying the binomial coefficient in factorial powers of its lower 
index. That is, we must write 

(7.13) ni a> (nj -f- 1) <6) = Z Ci(n, a, b)(ni — b) li \ 

t-0 

Again it is not possible to give a simple expression for the coefficients Ci(n, a, b) 
in general, but for the first few moments they present no difficulty. For example 
from (7.9) 


(7.12) 


JSl(ni 0) fi ri = 



388 


A. M. MOOD 


Efyhru) = 2 »i(n — ni ■+■ 1) ( n * . ^ p" 1 

»i-« \ ni — t f 


(7,14) r 

" H |^(« - * + 1) nj l _ i 1 j + (n - 2t)(n - * - 1) 

•(::* :*)--<»- 

= [t(n — t + 1) + (n — 2It) (« — t — l)pi — (n — * — 1)p?]p{ p*. 

We give below some means, variances and covariances which will be required 
later. 


= 23 [*(tt — t + 1) ■+• (n — 2t) (ni — t) + (ni — t) <2> ] 


•CnV-V)^ 


E(r u ) = Pip*[(n - i - 1)P» + 2], 

P(«u) = p{[(n - k)pt + 1], 

Vfuru = Pi + >2 {(» - t - j)®P 2 + (n - i - j)pi(l + 5pi) + 6p? 

- [(n - i - 1)P» + 2][(» - j - l)p* + 2]}, 
<r, U r l{ = p\*p\ {(n - 2t) <s> pi + (n - 2i)pi(l + 5pj) + 6p? 

- [(n - i - l)p« + 2] 2 } + pip*[(n - i - l)Ps + 2], 

(7.15) <r rii r„ = P 1 P 2 {(n - i - j - 2) <J) pip’ + 4(n - i - j - l)PiP» + 2 

- I(n - t - 1)P> + 2][(n — j — l)Pi + 2], 
O’.,,.,* = p{ +t ps{(» - * - * + 1) W> - 2(» - i - k) {t) pi 

+ (n - i - k - l) <2> Pi ~ [(« ~ i ~ D Pt + 2][(n - fc)p» + 1]}, 

o'*!**!* = Pi k {( n -2 k + 1) <2) - 2(n - 2fc) (2> pi + (n - 2 k) m p\ 

- [(n - k)pt + l]*j + pt[(n - k)pt + 1], 
= P 1 P 2 {(n - k - j - 2) <2) pip* + 2 (n - k - j - l)Pi(l + Pt) 

+ 2(1 + Pi) - Pi[(n - k)pt + l][(n - j - l)Pi + 2]}. 

In order to obtain the distribution of runs in samples from a multinomial 
population, we multiply the distributions of Section 4 by 

(7.16) P(n<) = [”] II p?‘. 

Corresponding to (4.1) and (4.2) then, we have 

(7.17) P(r„ , n.) = fl M FM II pV 

<-1 L r *J 1 

pov, ni) = n ” J) em n pi*- 


(7.18) 



DISTRIBUTION THEORY OF RUNS 


389 


In (7.17) tij is the number of runs of length j of elements with probability pi'. 
In (7.18) u is the total number of runs of elements with probability p<. As 
before, we shall investigate in detail only the distribution (7.18). The moments 
of n< and r,- follow at once from (7.8) and (4.5) 

(7.19) u(n <»w))-Eii «■■’<», - D-)[;-_f;]npc 

where «, = », —r,-. The means, variances and covariances of the r< are 
E(r { ) = np<(l - Pi) + , 

(7.20) ff r< T, - -np,p,(l - 2p, - 2p, + 3p,p,) - p,p#(2pi + 2p, - 5p<p>) , 
ff ri r, = npi(l - 4pi + 6p* - 3p*) + Pi (3 - 8pi + 5p*). 


8. Asymptotic distributions from binomial population. We turn our atten¬ 
tion first to the distribution (7.7) and state a theorem analogous to Theorem 2 of 
Section 5. , 

Theorem 1. The variables 


Ui = Xi 


su — np\pl 
y/n 


% « 1 , • • •, k — 1 , 


( 8 . 1 ) 


Uk = x t 


Uk+i = Vi 


Sik - nptps 
y/n ’ 


- np?pi 

y/n 


i — 1 , • • •, h — 1 , 


u*+* = 2 


ni — npi 
y/n 


are asymptotically normally distributed with zero means and variances and covari¬ 
ances 

**i*i = Pipl “ (2 i + l)pi 4 p* + 2 p\ i+1 p \, 

®*i*i — — (* + j + l)pi +/ pi + 2p{ +m p« t 
Sx ( xk = ~(i + k + l)p\ +k pl + p{ +i+1 pi , 

*»*»* = PiPs - (2fc + l)p“p», 

= “(* + j + l)p}pj + ' + 2p\p[ +i+ \ 

(8.2) o ym = p\p\ - (2i + ljpxpl* + 2p!p* <+1 , 

= - (i +j + 3)p2 +2 pj +S + 2pi +1 pt +1 , 

~ “(& + J + 2)pi +s p» +1 + p* +1 p»(l + pj), 

ff»i» = ipiPi + Pi +, P>(1 - 4pj), 
ffx h . = (k + 1)PiPs - Pi(l + Pt), 

«*i. = *p!p» + PiPj + 1 (1 - 4pi), 


= PiPj . 



390 


A. M. MOOD 


We have taken s*a and n% to be the dependent variables of (7.7). The method of 
proof of this theorem is the same as that of Theorem 1 in Section 5, and will be 
omitted. As consequences of the theorem we have 
Corollary 1. The variable 

k+h 

Q m 1 £ a'uiUi 

i 

is asymptotically distributed according to the x 2 -iaw with k + h degrees of freedom . 

Corollary 2. Any subset u it , w, 9 , • • •, Ui m of the variables (8.1) is asymptoti¬ 
cally normally distributed with zero means and variances and covariances || || , 

and 


Q = £ 

is asymptotically distributed according to the \-law with m degrees of freedom . 
| a' itk || is the inverse of || <n jik || . 

Corollary 3. If s t - = 8 u + 821 represents the total number of runs of length i of 
both kinds of elements , and s k the number of runs of length greater than k — 1, then 
the variables 


(8.3) 


_«< - n(p\pt + pxpj) 

y/n 

_«* - n(p\p t -}- pipj) 

/— 1 
y/n 


i — 1, • • •, k — 1, 


are asymptotically normally distributed with zero means and variances and co- 
variances 


(8.4) (Tij — (T Xi Xj 4~ OxiVj 4“ &Xj9i 4" 

where the terms on the right of (8.4) are defined by (8.2). We have put h = k 
in Theorem 1 to obtain this result. 

Corollary 4. The variable 

k 

(8.5) Q = 2 v^XiXj 

1 

where the Xi are defined by (8.3) and |( v*’ || is the inverse of (8.4), is asymptotically 
distributed according to the x-law with k degrees of freedom. 

Cobollary 5. If r denotes the total number of runs of both kinds of elements, 
then 


( 8 . 6 ) 


r — 2wpipi 
2\/npipj(l - 3pipi) 


is asymptotically normally distributed with zero mean and unit variance. This is 
the result obtained by Wishart and Hirshfeld [11]. 



DISTRIBUTION THEORY OF RUNS 


391 


9. Asymptotic distributions from die multinomial population. In this 
section we assume k > 2 to avoid degenerate distributions. Because of the 
function F(r,) in (7.18) we do not investigate this distribution directly, but 
derive a more general asymptotic distribution as was done in Section 6. We 
consider the distribution 

(9.1) DK,»o-n([*]K-) 

corresponding to (6.9). This is derived from (7.19) in the same manner as 
(6.9) was from (4.5). As before, we have replaced the numbers — 1 in (7.19) 
by ri {, an unessential change as far as the asymptotic theory is concerned. 
We recall that 

(9.2) n = rii - mu 

hence we need only show that the variables on the right are asymptotically 
normally distributed in order to have the same result for the r». Corresponding 
to Theorem 2 of Section 6, we state 
Theorem 1. The variables 


xa - 


(9.3) 


mg - npiPi 

Vn 


n, — rtVi 

-= 

y/n 


i f j « 1, ..., k — 1, 


i = 1, • • •, k — 1 


are asymptotically normally distributed with zero means and variances and co- 
variances 


(9.4) 


= 3 piPjPapt , 

= “"3 P%P»Pt f 
PiVii 1 3p*py), 

= *~3 PiPj , 

&ii»8 = 2 PiP* , 


^*'*',* — 2pi( 1 Pi), 

*i.i = P<(1 ~ P<)- 

In these relations the symbols are defined by 


<ru,it = -3 ViPiPt i 
m.ii = Vbii\ ~ 3 pi), 

& a,a = P?(l + 2 — 3p*), 

*</.« = -2 PiPiP., 

<ru.i = P<P»(1 - 2p<), 

= ~P*Pi i 


°iiM ~ > Vij.t ~ i ff i.i ~ 

and different literal subscripts represent different numerical subscripts. These 
moments have been computed by means of the identity (6.12). The proof of 
the theorem is like that of Theorem 2 of Section 6 and will be omitted. We can 
now give the limiting form of the distribution of the in (7.18) as 



392 


A. M. MOOD 


Corollary 1 . The variables 


(9.5) 


Xi 


r< ~ np,( 1 - pi) 
Vn 


*= i, 2,..., k 


are asymptotically normally distributed with zero means and variances and co- 
variances 


(9.6) 


(fix * p<( 1 - Pi) - 3p*(l - Pi)*, 
on - -p,p,(l - 2 pi - 2Pi + 3prpj). 


These limiting moments follow at once from equations (7.20). 

Corollary 2. The variable 

Q = L a’XiXi 

where the. Xi are defined by (9.5) and || c' 1 1| is the inverse of (9.6), is asymptotically 
distributed according to the x-law with k degrees of freedom. 

Corollary 3. If r = Sr,- denotes the total number of runs, then 

_r — »(1 - 2p!) 

is asymptotically normally distributed with zero mean and variance 

a* = SpJ + 22pJ - 3(Sp*) s . 


The author would like to record here his gratitude to Professor S. S. Wilks 
who suggested the problem and under whose direction this paper was written. 


REFERENCES 

[1] Karl Pearson, The Chances of Death and Other Studies in Evolution, London, 1897, 

Vol. I, Chap. 2. 

[2] Karl Marbe, Naturphilosophische Untersuchungen zur Wahrscheinlichkeitslehre t 

Leipzig, 1899. 

[3] Karl Marbe, Die GleichfOrmigkeit in der Welt } Mttnchen, 1916. 

[4] Karl Marbe, Mathematische Bemerkungen t Mtinchen, 1916. 

[5] Karl Marbe, Grundfragen der angewandten Wahrscheinlichkeitsrechnung , Mtinchen, 

1934. 

(6J H. GRtiNBAUM, IsolierU und reine Gruppen und die Marbe 1 sche Zahl “p”, Wttrzburg, 
1904. 

[7] H. Bruns, Wahrscheinlichkeitsrechnung und Kollektivmasslehre f Leipzig, 1906, p. 216. 

[8] L. v. Bortkibwicz, Die Iterationen , Berlin, 1917. 

[9] R. v. Mises, Zeit.f. angew . Math . u. Mech., Vol. 1 (1921), p. 298. 

[10] E. Ising, Zeit.f . Physik. t Vol. 31 (1925), p. 253. 

[11] J. Wishart and H. 0. Hirshfeld, London Math . Soc. Jour., Vol. 11 (1936), p. 227. 

[12] W. L. Stevens, Annals of Eugenics, Vol. 9 (1939), p. 10. 

[13] A. Wald and J. Wolfowitz, Annals of Math . Stat., Vol. 11 (1940). 

[14] J. A. Joseph, Annals of Math. Stat., Vol. 10 (1939), p. 293. 

Princeton University; 

Princeton, N. J. 



A GENERALIZATION OF THE LAW OF LARGE NUMBERS 

By Hilda Geuungeb 

It is well known that the law of large numbers can be established for dependent 
as well as for independent chance variables by using Tchebycheff's inequality [1] 
and assuming that the variance of the sum of the variables tends towards 
infinity less rapidly than n s . 

In recent years v. Mises has introduced the notion of statistical functions [2] 
and has shown that, under certain assumptions the law of large numbers is still 
valid if, instead of the arithmetic mean of the n observations * 1 , • • • ,x„ a 
statistical function of these observations is considered. For example in the very 
special case, where the n collectives which have been observed are identical 
A-valued arithmetic distributions with probabilities pi, •• • ,p* corresponding 
to the attributes Ci , • • • , c* and with observed relative frequencies »i/n, • • • , 
n*/n one obtains the result: It is to be expected for every e > 0 with a probability 
P n converging towards one as n —► », that | /(ni/n, • • •, n*/n) — f(pi , • • •, pt )| 
< « under very general conditions concerning the function/. 

In the present paper we shall generalize these new results so that they will 
apply also to collectives which are not independent. 

1. Lemma concerning alternatives. Let us consider the n-dimensional 
collective consisting of a sequence of n trials and let us assume that the n trials are 
alternatives, i.e. for each trial there are only two possible results which we 
denote by “success,” “failure,” by “occurrence,” “non-occurrence” or by 
“1,” “0.” The total result of the n trials is expressed by » numbers each equal 
to 0 or 1. Let »(xi, xt, • • •, x„) be the probability of obtaining the result xi 
at the first trial, Xj at the second one, • • • , x„ at the last one (x, = 0, 1; v = 

1, • • • , n). In the same way we introduce t>u(x, v) = T" 1 . v(x, y,x «,•••, x«) 

* 1 ** * •»*» 

and generally t» M ,(x, y) as the probability that the Mth result equals x, the rth 
equals y, (y ^ v), and finally let j>„(x) = £ v(x, y) be the probability that the 

V 

/ith result equals x. In particular let us write 

«V(1) = p* , V(l> 1) = Pv > (p, v = 1, • • • , n; y * v) 

p„ being the probability of success in the pth trial and p„, the probability of 
simultaneous success both in the /tth and vth trials. 

The variance s’ of the sum (xj + • • • + x„) is easily found: 

393 



394 


HILDA GEIRINGER 


«l - Var(xi+ ... +X„) = Z (* 1 +•••+*« —Pi-— Pn)V*l, • • •, *•) 

= Z (xi - pO V*i ,•••,**) + ••• 

*i»* * *»*» 

+ 2 Z (*i - pi)(x» - pt)v(xi, •••,x») + ... 

»l*’**.*w 

= S (*1 - Pi)V(zi) + • • • +2 Z (*1 “ pi)(*t - Pi)vu(xi ,&) + •'• 

*1 ® 1 .*| 

= Pl(l — Pl) + • • • + p»(l — Pn) + 2(pu — PlPs) + • • • + 2(p«_l,i, — Pn -1 P»). 
Thus: 

n n 

(1) 8^ = Var (si + • • • + X „) = ZP»(l -p.)+2 2 (p M . - p„p,). 

r-1 M.r-1 

The first sum on the right is £n/ 4; the second one consists of N = Jn(n — 1) 
terms, therefore we cannot be sure that it tends toward zero after division by n*. 
Putting p M „ — p M p„ = we see immediately: 

(a) A necessary and sufficient condition for lim sjn = 0 is 

n-» oo 

(2) lim 1/n 2 Z = 0. 

n-*oo 

Denoting by <r 2 the variance of and by the correlation coefficient of 
v M >(x, y) we have 

ot^p ** P/** VuV* = rpyO'nO'p . 

We see that a*? takes values between —1/4 and +1/4 and our conditions (2) 
postulates that the sum of these positive and negative terms tends towards 
infinity less rapidly than n 2 . As to the meaning of the signs of these terms we 

see that a term will be ~ 0, according as p M ,/p„ ~ p M . This means: the 

fact that the yth event has presented itself makes the occurrence of the /xth 
event either more probable; or it is without influence on it; or it makes it less 
probable. And we see that s n /n tends toward zero, only if there is a certain 
“equalization” or “stabilization” of positive and negative mutual influence. 
If in particular for a pair of values y, = +1, that is v^(0, 1) = tv(l, 0) = 0, 
the events must either both occur or both fail and p M = p,. If t> = — 1 we 
have v(0, 0) = t> M ,(l, 1) = 0 the simultaneous occurrence is impossible and 
likewise the simultaneous failure, and p M + p, = 1. If we have p M , = 0 (case of 
mutually exclusive events) then p„ + p, % 1. 

n n 

Since si ^ 0 and Z P»(l — P») = Z ^ n /4 we conclude from (1) that 

i—i >—i 

n 

Z “l" 5 = —n/8 and we obtain the following simple sufficient condition for the 
validity of (2): 



LAW or LARGE NUMBERS 


395 


(b) Let us denote by m n the number of all combinations u, v(u ^ n; v n; u ^ v), 
such that, however large n may be, a^ } > t, where t is a given positive number; 
1 * 

then — 2 a'* 5 converges toward zero if lim m»/n 8 = 0. 

»*»-i — 

We have in fact 


-u 

O 


(n) 

M* 


£ m» + (AT - m»>€ 


1 1 

and dividing by n we find that — 2 “m"* is enclosed between and m»/n* + 

" oW 


N -m. 


n Z Mf r-l 


n* 


which both tend toward zero. Roughly speaking this condition implies 


that for “almost all” combinations of indices the converge toward 
“negative or vanishing correlation.” 


On the other hand the sum of all positive and negative terms in 2 

1 

cannot become less than —n/8. Therefore, if “almost all” positive terms are 
supposed to tend towards zero it follows that also almost all negative terms 
tend toward zero. Thus we obtain the sufficient condition (c) which is neither 
more nor less general than (b ): 


1 

(c) The sum tends towards zero as n —> oo, if u almost all ” the. indir • 

n 1 M ,*-i 

wduaZ terms aft* = tend toward zero . Or more exactly, the sum in 

question tends toward zero if | aft* | g e for every e and sufficiently large n with 
the exception of u n terms where lim Un/n 2 = 0. That is “convergence towards 

n-*oc 

independence” for almost all combinations u, v of indices. Let us, for example, 
assume that all the p, are ^ 0 and all the = 0, then all the are certainly 
< 0 and (b) is fulfilled; but it is easily seen (3) that in this case pi + pt + ■ • • 
p n ^ 1. Therefore all the products p M p» (with the possible exception of a finite 
number) tend toward zero, and (c) holds as well. 


2. Statistical functions. Suppose n observations have given the results 
xi, xt, • • • , x n . Let us assume for the sake of simplicity that they are all 
bounded between two real numbers A and B. To each real x corresponds the 
number n S n (x) of observations with a result g x. S„(x) is a monotone non¬ 
decreasing step function with n steps, each of height 1/n; however several steps 
may coincide at the same point. We have 

(1) <S»(a:) = 0 if x < A and S n (x) = 1 if x jg: B. 

S n (x) is called by v. Mises the partition (Aufteilung) of the n observations. 
/S»(a;) coincides with the well known cumulative frequency distribution if the 
attributes c« («c = 1, • • • k) and the corresponding relative frequencies »i/n, • • • 
n»/n are given. 



396 


HILDA GBIBINGBR 


A statistical function is a function of the X\, x% , • • • , x n which depends only on 
S n (x), the partition of the n results. It will be denoted by f{S n (x)\. If the c* 
and the njn are given then statistical function means simply “function of the 
relative frequencies ,, and it becomes a function of k variables. In f[S n (x)) the 
partition S n (x) takes the place of the independent variable. Such a statistical 
function has the following properties: (o) It is a symmetric function of the 
X \, xt , •. • , x n . That is, it is independent of the succession of the n results* 
( b ) It is “homogeneous” in the following sense: If instead of n observations 
we have nl observations and if at the same time each x, is replaced by lx, then 
the statistical function is not changed. 1 Examples of statistical functions are 
the moments 


-£x r , = [ x r dS n (x) = M° r 
n r —1 J 

or, if M\ =* a, the moments about the mean a: 

- Z) (x, — a) r — [ (x — a) r dS n (x ) = M r , etc. 
n r—i J 

The independent variable in f[S n (x)\ is a partition; but in addition we shall 
define f{P(x)} where P(x) is a certain bounded distribution which is not neces¬ 
sarily a partition. A distribution P(x) is called bounded if 

(1') P(x) = 0 if x < A and P(x) = 1 if x ^ B. 

If this is true for a sequence Pi(x), Pi{x), • • • with the same A and B then the 
sequence is called uniformly hounded. Let us now consider a bounded partition 
P(x) which in every point of continuity of P{x) is the limit as n oo of a se¬ 
quence of bounded partitions AS n (x). As S n (x) converges toward P{x) } if 
f{S n (x)\ converges towards a limit L which does not depend on the limiting 
process S n (x) —> P{x) then that limit shall be denoted by f{P(x)\; it will be 
called the value of the statistical function at the “point” P{x) and/{S n (aO} will be 
called continuous at P(x ). The definition of continuity can be given also in the 
following way: Corresponding to every c > 0 exists an i > 0 such that 

(2) | f{Sn(x)) ~f{P(x)} | < * 

for all values of n and for every bounded S„(x) such that at every point of 
continuity of P(x) 

(3) | S n {x) - P(x) | S v . 

In this case /{ S n (x) | is called continuous at the point P(x). Thus a statistical 
function is defined for bounded partitions and for certain bounded distributions 
which are not themselves partitions. If the continuity defined by (2) and (3) 
exists for a sequence Pi(x),P t (x), of bounded distributions with the same ij 


1 This condition of homogeneity is fulfilled e.g. for 'Vxix t • • • x» but not for XiXt •••*». 



LAW 07 LARGE NUMBERS 


397 


corresponding to a given «, we call the statistical function uniformly continuous 
at the points Pi(x), Pi(x), • •• . 

3. The general law of large numbers. The generalization of the law of large 
numbers which we have in mind can be demonstrated in a way analogous to the 
demonstration given by v. Mises in the case of independent collectives if we 
introduce the results of paragraph 1 in order to estimate the variance. We shall 
consider here only one dimensional, bounded collectives in order to make clearer 
what is the essential of the generalization. 

A sequence of dependent collectives Pi(x), Pi(x), ■ ■ ■ , P«(x) can be given in 
the following manner. Let P(xi, x», • • • , x n ) be the probability that the result 
of the first observation is g x \, of the second g Xi, • • • , of the nth g x« , 
This distribution will be said to be bounded in {A, B) if P — 1 when all the x, 
are ^ B and P = 0 if at least one of these arguments is less than A. From this 
n-dimensional distribution we deduce n one dimensional distributions 

Pi(x) = P(x, B, ■ ■ ■ , B), 

P»(x) = P(B, x,B, ... ,B), , P„(x) « P(B, ...,B,x) 

where P,(x) is the probability that the vth observation be ^ x. The P,(x) are 
uniformly bounded in (A, B) which is a consequence of P(x i, xt, • • •, x n ) having 
been assumed to be bounded in this interval. In an analogous way we deduce 
from P(x i, Xi ,.••,£„) the \n{n — 1) uniformly bounded two dimensional 
distributions 

(2) P u (x, y ) = P(x, y,B, ■■■ B), P lt (x, y) = P(x, B,y, B, • ■ • B), • ■ ■ . 

Here P„,(x, y) is the probability that the /uth result is gx, the vth result gj/, 
and we have P^{x, y) — P, h (y, x). Of course we have also 

(10 Px(x) = P u (x, B) = Pu(x, B) = ... = P ln (x, B ) 

Pa(x) = Pis(B, x) = Pss(x, B) = • • • = P lB (x, B) etc. 

If we put in (2) x = y we obtain P„,(x, x) = P,„(x, x) and we introduce 

(3) P„(x, x) = P,„(x) - P M ,(x) 

the probability that both the Mh and the vth observation is gx. Then P M>0) 
equals zero if x < A and equals one if x 2: B, and this is valid with the same A 
and B for all the distributions P, M (x). 

Now if px, pi, • • • , p» are the probabilities of success for n general alterna¬ 
tives Tchebycheff's Lemma asserts that the probability W that the average 
(xi + Xt + • • • + x»)/n of n observations differs by more than y from its expecta¬ 
tion (pi + pi + • • • + p«)/n is subject to the following inequality 


Here is given by (1) of paragraph 1. 


(4) 


W g - 5 Var 
r 


(- 


+ x% + » ■ » + Xn 
n 


\ = A. 

I rfn* 




398 


HILDA GEIRINGEK 


Let us introduce the average P n {x) of the P,(x) : 

(5) P»(z) = [Px(x) + P 2 (x) + • • • + P»(*)]/n 

and let Q n be the probability that at any point of continuity of F„(x) the in¬ 
equality 

(6) | S n (x) - P n (x) | > n 

holds. Our aim will be to show that for every y under certain restrictions re¬ 
garding the given collectives, Q n tends toward zero as n tends toward infinity. 

For a fixed point x' the probabilities P„(x) = p, and P M „(x) = p„, are constants 
and we put F„(x) = p„ = (pi + pt + • • • Pn)/n. The probability that in x' 

(7) | 5n(x0 - Pn(x') I > 1)/2 

is then, according to (4) smaller than (sl)*</(£ 17 )V. Here we denote by («*),» 
the value of s* in x' (as given by ( 1 ) in paragraph 1 ). 

Now we divide the interval (A, B ) in N parts in such a way that in every one 
of the N intervals e.g. in (x', x") the variation 

( 8 ) 8 = F n (x") - P B (x') £ n / 2 . 

If there is at x' (or at x") a step of P n (x) we take the limit which P„(x) approaches 
as x —► x' (or x") from the interior of the interval. In order to obtain such a 
division we need only divide the total variation 1 of P n (x) in 2 /ij equal parts and 
project these points of division on P n (x), disposing however in a suitable way of 
horizontal parts of P n (x). The abscissae of these points form the endpoints 
of the N intervals. If there is a step of P n (x) at an endpoint of one of these 
intervals the variation in both the adjacent intervals can only be diminished. 
It is further possible that the two ends of an interval coincide x' = x", this will 
be so if P n (x) has for x' a step > rj/2. In any case we have a division in N g 2/rj 
intervals such that all the points of continuity of P n (x) are enclosed in them and 
in each of these intervals ( 8 ) is valid. 

Let us now assume that in the left end point x' of the rth interval (x', x") the 
inequality 

(9) | Sn(z') - Pn(x') | g r,/2 

is valid. Then we have for every x between x f and x" 

(10) | Sn(x) - F„(x) \ £ v/2 + 8 £ V . 

Because, since S n ( x) and P n (x) are both monotone, the difference S n (x') — 
Pn(x r ) cannot increase by more than 5 y/2 as x varies from x' to x". There¬ 

fore if ( 6 ) is valid for any point x in this interval then (7) must be valid for 
the left end point x' of this interval and the probability q T of this latter inequality 
is less than or equal to 4 («l)*'/*? 2 n. 

But there are N intervals with the left endpoints x(, x*, • • • x' s and the 



LAW 07 LARGE NUMBERS 


399 


probability that (6) may be valid in any point belonging to any one of these 
intervals is ^ gi + q t + ■ • • + q *. Denoting by «» the greatest of the N 
variances («*)»;, (s*),j, • • •, we have for Q n (which is the probability that 
(6) may be valid at any point of continuity of P(x)) the inequality 

(11) Qn =» <?1 + + • • • + Qlf ^ ~r~» 8 n ^ 

rfn 1 ij* »* 

Therefore Q n tends toward zero for every i) if s„/n tends toward zero. 

But according to (2) in paragraph 1, sjn tends toward zero if for every * in 
(A,B) 

(12) lim It [PM) - P,(x)P,(x)] - 0. 

ft-** fi, pmml 

Considering the definition of continuity of a statistical function we have ob¬ 
tained the following result: 

As in (1'), (2), (3) and (5) let P^(x 1 y) be two dimensional distributions (p, v = 
1, • • • , n; m 5 ^ v), uniformly bounded in (A, J5) ; PM, B ) = P„(x); PM, X) = 
PM) and r,(x) - l/v(Pi(x) + P*(x) + • • • + P,(x)). 

If the variable partition S n (x) is bounded in (A, B ) and if f[S n (x)) is uni¬ 
formly continuous at the “points” Pi(x), 7 2 (^), • • • then the probability that 

(13) \f{S n (x)\ -f\P n (x)}\> t 

tends toward zero for every c as n —► oo, provided (12) is uniformly valid for every 
x in (A, B). 


4. Examples. Let us illustrate by simple examples. 

1) In order to define the P,(x) etc. mentioned in our theorem we define the 
n-dimensional distribution P(x %, x* , •.. x n ) used at the beginning of paragraph 
3 by indicating the probability density 

p(xi , xt , • • • , x n ) = C n [l — x\x% • • • x n ] in the “unit cube”, 

= 0 elsewhere. 

The corresponding probability distribution is 

/•* i 

(2) P(xi,*», ...,*„) - / Mi,Xi, ,Xn)dxi dx». 


By putting 



we see that P(xi, x*, • • • , x n ) equals unity if all the arguments are «= 1 and it 
equals zero if one of these arguments is less than 0. Therefore P(xi, x% , • • • , 
x n ) is bounded in the unit cube. 



400 


HILDA GEIRINGER 


(4) 


From (1) we deduce the two-dimensional densities 

vM, y) * C n ^l — in the unit square, 

= 0 elsewhere 

and the distributions 

(5) Ppr(%i y) — IX vM, y)dxdy. 


We see that 


PM, y) 


= c n xy 


(‘ 


= 0 
= 1 


xy\ 

2 "/ 


in the unit square 

if * or y ^ 0 
if x and y ^ 1 


and e.g. for x ^ 1, 0 < y < 1 we have PM, y) — PM, V) etc. Thus the 
PM, y) are completely given. 

It follows from (3) that —C„/2 n = 1 — C„ ; therefore putting C„ = C we 
have in (0, 1) 

PM, x ) — PM) — Cx 2 + (i — C)x* 

( 6 ) 

P,(x) - Cx + (1 - C)x 2 

therefore 

(7) PM) ~ PM)P>{x) = C(1 - C)x s ( 1 - x? 

is < 0 for every x in (0, 1) since C > 1. For x & 0, PM) and P,(x) both 
equal zero and for x ^ 1 they both equal 1., Therefore our conditions of para¬ 
graph 1 are fulfilled. We see that C n tends towards unity as n —► », therefore 
for every x in (0, 1) PM) ~ PM)P>( *) tends towards zero, we have “conver¬ 
gence towards independence” but by no means independence. 

This example was based on a symmetric density. Let us give an example of 
asymmetric and arithmetic distributions. For the sake of simplicity let Pi(x), 
Pj(x), ••• be arithmetic distributions each with only three steps at x = 0, 1 
and 2. As starting point we take the n-dimensional arithmetic distribution 
v(xi, Xt, • • • x„) which gives the probability that the first result equals xi, the 
second x», • • • , the nth x n , the x, being equal to 0 or 1 or 2; thus v(xi, x%, • • • , 
x„) takes 3” values the sum of which equals unity. We deduce the two dimen¬ 
sional distributions t >M, V), e.g. t>u(x, v) — £ v(x, y, x>, • • •, x»), the prob- 

ability that the first result equals x, the second y, and finally the t>i(x) — 
•]£ «u(x, y), etc. According to the definitions of P,(x) and PM) we have then: 



LAW 07 LARGE NUMBERS 


401 


(8) P,(x) - 0 (x < 0) 

- ®.(0) (0 £ x < 1) 

- ®r(0) + »r(l) (1 S X < 2) 

- 1 (2 5 4 

(9) PM) “0 (x < 0) 

- v(0, 0) (0 £ x < 1) 

= v„,m + v(10) + v(01) + v(ll) (1 5 * < 2) 

= 1 (2£ x). 


Now we subject v(xi, • • •, x») to the following conditions: Every vfa , ■ ■ •, x n ) 
equals zero if it contains either: at least two “zeros,” or: at least one “zero” 
and one “one,” or: at least two “ones.” All the other rvalues are supposed 
to be different from zero. Then we have 

v„,(0, 0) = v(l> 0) = v„,(0, 1) = «W(1> 1 ) = 0 

therefore PM) = 0 for x < 2 and PM) = 1 for x ^ 2. On the other hand 
v,(0) = v(2, 2, • • • 2, 0, 2, • • • 2) and t/,(l) = v(2, 2, • • • 2, 1, 2, • • • 2) there¬ 
fore P,(x) ^ 0 for 0 ^ x < 2 and we have thus for every finite n 

PM) ~ Pn(x)P ’(*) = 0 for x < 0 and x ^ 2, 

<0 for 0 g x < 2. 

Therefore the condition (b) of paragraph 1 is fulfilled and thus (12) paragraph 3 
holds. 

I hope to have the opportunity to discuss more general applications of this 
theorem later. 

A generalization of the strong law of large numbers may be given in a simi¬ 
lar way. 


REFERENCES 

[1] B. H. Camp, The Mathematical Part of Elementary Statistics, New York, 1934, page 256. 

[2] R. v. Miseb, "Die Gesetze der grofien Z&hl ftlr statistische Funktione,” Monatshefte 

filr Math. u. Physik, 1936, p. 105-128. 

[3] H. Gm ringer, "Sur les variables al6atoires arbitrairement li6es,” Revue de VUnion 

Interbalcanique, 1988, p. 6. 

Brtn Mawr College, 

Bryn Mawb, Pennsylvania. 



CONDITIONS FOR UNIQUENESS IN THE PROBLEM OF MOMENTS 

By M. G. Kendall 


It was shown by Stieltjes [1] that in some circumstances it is possible for two 
different frequency distributions to have the same set of moments. For in¬ 
stance, the integral 

J ,-* n+i e~‘e” dz 


around a contour consisting of the positive x-axis, the infinite quadrant and 
the positive y -axis is seen to be zero and it follows that 

x n e~ xi sin x i dx — 0. 

Thus the frequency distribution 

i . 0 < x < », 

(1) dF = ie * (1 — X sin x l ) dx 

0 < X < 1 


has moments which are independent of X, and equation (1) may be regarded as 
defining a whole family of distributions each of which has the same moments. 
It is easy to see that moments of all orders exist, and in fact 

l4 (about the origin) = $(4r + 3)1. 


A second example of the same kind, also due to Stieltjes, is the distribution 

0 < x < 00 , 
0 < X < 1, 

for which 


( 2 ) 


dF = x — x sin (2ir logx)} dx 
e*v t 


= e lr(r+2> . 


The question naturally arises, what are the conditions under which a given 
set of moments determines a frequency distribution uniquely? The question 
is of great interest to mathematicians, being closely linked with problems in the 
theory of asymptotic series, continued fractions and quasi-analytic functions; 
and it also has importance for statisticians since there is sometimes occasion to 
be satisfied that a problem of finding a frequency distribution has been uniquely 
solved by the ascertainment of its moments or semi-invariants. Stieltjes him¬ 
self considered a more general problem: given a set of constants <%, 

402 



PROBLEM OF MOMENTS 


403 


ci, • • • Cr, • • • does there exist a function F, non-decreasing and possessing an 
infinite number of points of increase, such that 

(3) £ x'&F = 

and under what conditions is F unique, except for an additive constant? 
Stieltjes showed that if we express the series 


(4) 


ao 

z 


r—0 


(-i y 


Cr 

Z' 


as a continued fraction of the form 


(5) 


1111 1 1 
+ fflj + OjZ + <U + OSn-lZ + + 


it is a necessary and sufficient condition for the existence of at least one F that 
all the a’s be positive; and that the function is unique or not according as the 

oo 

series 2 ( fl r) diverges or converges. (If the a’s are positive it must do one or 

r—0 

the other.) The integral of equation (3) is to be interpreted in the general 
Stieltjes sense, so that the result applies to discontinuous as well as to continuous 
distributions. This is also true of the results obtained below. 

Hamburger [2] discussed the similar problem when the limits of the integral 
in equation (3) are ± «, and showed that a function F exists if the expression 
of (4) as a continued fraction of the form 


bo bi bi 

ao + z+ai + z+ Oi + 2 + 

gives positive values of the 6’s. In order that F may be unique it is necessary 
and sufficient that the continued fraction be completely (vollst&ndig) convergent 
in the sense defined by Hamburger. 

Unfortunately these criteria, though mathematically complete, are not very 
useful to statisticians because as a rule it is too difficult to express the coefficients 
a and b explicitly enough in terms of the given c’s to enable questions of sign or 
of convergence to be decided. So far as I know, no more convenient criterion 
for the general Stieltjes problem has been found; but progress is possible if one 
considers the narrower question: given a set of moments, is the distribution 
which furnished them unique, that is to say, can any other distribution have 
furnished them? This is more limited than the Stieltjes problem because we 
know that at least one solution exists. 

Contributions to this subject have been made by L6vy [3] and Carleman [4]. 
L6vy shows that if moments of all orders exist and are positive it is a sufficient 
condition for them to determine a distribution uniquely that Hn/n remains 
finite as n tends to infinity. (Here and elsewhere in this paper Hr refers to the 
moment of order r about any point, not necessarily the mean.) Carleman shows 



404 


M. G. KENDALL 


that, for the case of limits — » to + » the moments determine the distribution 
uniquely if 

•0 j 

diverges. For the limits 0 to « he gives the corresponding series 

ao ^ 

5 GO 17 * 5 * 

a criterion which can be improved upon, as will be shown below. 

The purpose of this paper is to develop criteria of this kind more systematically 
and to give more general criteria suitable in cases where the moments are not 
known explicitly but the behavior of the frequency distribution at its terminals 
is known. 

Three preliminary points necessary for the later argument may be noted. 

(1) Define the absolute moment of order r by 

£Vl dF 

and recall that 

V1<VJ<J<J< • • • < Vf T < • • • 

(cf. Hardy and others, [5]). In other words the quantities v r ,T form an increas¬ 
ing positive sequence and their reciprocals a decreasing positive sequence. 

(2) The quantity v\[ n /n must either tend to a limit or diverge to infinity as 
n —» <». For suppose that 

lim Vn"/n = k, 
lim v\i n /n — l. 

Writing temporarily v*" = o», we have that, given e there is an N such that 

a„/n > k — t 

for an infinity of values of n greater than N. Similarly there is an M such that 

ajn < l + c 

for an infinity of values of n greater than M. Now choose p such that a ,, a^+i 
are two consecutive values, one near the upper limit and one near the lower 
limit. This can always be done and we can take p as large as we please. We 
then have 


a„ > p{k — t) 

<Wi < (p + 1)(Z + «) 



PROBLEM OP MOMENTS 


406 


and hence, since <Vn > a. 


(k - t)p < (p + 1 )(l + t) 


giving 


(k — t) < - + 2« + -. 

P P 

Thus k — l can be made as small as we please and is thus zero. 

The argument can be very simply adapted to the case in which k is infinite, 
and if l is not finite k, being not less than l, is infinite. Thus as n —► °o either 
lim ajn exists or a„/n —► oo. 1 

(3) If any moment fails to converge, so will all moments of higher order. It 
is evident that more than one distribution can exist having a limited number 
of finite moments given and the remainder infinite. Thus we need only consider 
the case when moments of all orders exist. Furthermore, if any even moment 

exists the absolute moment of next lowest order must exist; for if £ x in dF 

exists, then each of jf x in dF and J x in dF exist separately, each being positive. 

Hence f x in ~ 1 dF and f x in ~ l dF exist separately and thus [ | x ,n_1 1 dF = 
J~eto JQ X-oO 

— £ x 2 "~ 1 dF + J x 2n-1 dF exists. Hence we need only consider the case in 

which absolute moments of all orders exist. 

Theorem 1. A set of moments determines a distribution uniquely if the series 

T converges for some real non-zero t. 
r -0 r\ 

Consider the characteristic function 

- £ 


'dF. 


This is uniformly continuous in t, and so are its derivatives of all orders. Thus 
we have, in the neighborhood of t — 0 the Maclaurin expansion 


* (,) - tm\ k * 

r- o (rILarJ<-o 


r! 


1 This proof is necessary to the use of limits in the following theorems, but Theorems 2 
and 3 are equally valid if lim is substituted for lim therein. It is not generally true that 
if a n and b H are increasing monotonic sequences either lim a n /b n exists or a n /b n • as 



406 


M. G. KENDALL 


Consequently, under the condition of the theorem, which implies that 2 Hr 

T I 

is absolutely convergent for some radius p, 4>{t) has a Taylor expansion in the 
neighborhood of the origin and is thus uniquely determined by the moments for 
t < p. Furthermore, in the neighborhood of t = to we have 

*(t) = Zj f x'e^'dFj + R. 

(I _ kY 

The modulus of the coefficient of -—~ is not greater than v r . Therefore 4>(t) 

r! 

can be expanded in the neighborhood of t = to in a Taylor series with a radius 
of convergence at least equal to p. Hence the function defining <t>(t) in the 
neighborhood of the origin can be continued analytically throughout the range 
— oo to + <» and 4>(l) is uniquely determined in that range. 

But the characteristic function unqiuely determines the distribution; and 
hence the theorem follows. 

As a result of Theorem 1 we have the following generalization of the criterion 
given by L6vy. 

Theorem 2. A set of moments completely determines a distribution if lim r» B /n 

n-*oo 

is finite. 

It has already been seen that unless v\{ n /n becomes infinite the limit exists. 

V r t T 

By the Cauchy test for convergence the series 2 ~ converges if 

r! 


■- (ssy 

\ n / 


As n —► oo, («!) 1/n tends, in accordance with Stirling’s theorem, to 
(■\Z2vn c~ n n n ) lln i.e. to n/e. Consequently the condition (7) becomes 

lim [»»J, ln /n] el < 1. 

Thus if lim v\! n /n = k, say, the inequality (7) is satisfied for t < l/(ek) and the 
theorem follows. 

An important corollary, which enables us to disregard the absolute moments 
(which may not be given if part of the range is negative) is 
Theorem 3. A set of moments uniquely determines a distribution if 
lim psl <2n) /n is finite. 


rs.-l S Vtn 


l/(Jn) 

Pin • 


Thus, 


!• 1 l/ts.-i) ^ i« 2n 

lim 2 ^ T V *"- 1 - hm 2 ^ 


1 „>/<*«> 
n~Pln 





PROBLEM 07 MOMENTS 


407 


and is therefore finite if the limit on the right is finite. Thus lim v» K /n, which 
cannot be greater than the greater of the two limits of (2n — 1) and 

p*» t,w> /(2»)> must be finite; and the theorem follows from Theorem 2. 

00 J 

Now consider the series £ ~Tii . Since the successive terms form a monotonic 
£o>V 

sequence it is a sufficient as well as a necessary condition for convergence that 
n/v* B tend to zero. Thus, if the series is divergent n/v l J n cannot tend to zero 
and so v]! n /n cannot become infinite. Hence it must tend to a finite limit,which 
may in particular be zero. Hence from Theorem 3 we get 
Theorem 4. A frequency distribution is uniquely determined by its moments if 

ae j 

2 *T/r diverges. 

r*»0 

Since l/v\ !r is a decreasing sequence the series 2 \/v\' r converges or diverges 
with 2 1/Msr <2r> . The Carleman criterion, given by him for the case of limits 
± *, follows. For the case of limits 0 to oo the absolute moments are the same 
as the moments and the criterion can be the divergence of either 2 l/i4 lT or 


2 1/Mi 


l/(2r) 


Since (i, is greater than unity in the type of case under consideration 


the former series provides a more stringent test than that given by Carleman. 

At first sight it is rather surprising that the uniqueness of the distribution 
depends only on the behavior of the even moments, particularly when, by a 
simple extension of the above result, it is seen that a sufficient condition for 
uniqueness is the divergence of 2 I/mII* 4 ’ 0 or 2 1 /m**"" 0 or any infinite subset 
chosen from the moments. It will, however, be remembered that the odd 
moments are conditioned to some extent by the even moments, and that unique¬ 
ness is really determined by the limiting form of v n as n tends to infinity. 

It is evident that other tests may be derived from Theorem 1 by using the 
various tests for the convergence of an infinite series. For instance it is a suffi¬ 
cient condition for a set of moments to determine uniquely a distribution with 
positive range that 


- / 
n!/ 


M«+i 

(n+1)! 


1 + 




where 


a > 1 
0 > 0 


i.e. that 

( 8 ) 


M» 

Mn+l 


1 + 




y > 0 . 


It may be noted in passing that the distribution 

dF = e~ x dx 


0 < * < oo, 


for which 


Ur (about origin) — r! 

is completely determined by its moments. In fact, by direct reference to 
Theorem 1 we see that the series 2 (t<) r converges for t < 1. 



408 


M. G. KENDALL 


A frequency distribution of finite range is uniquely determined by its moments. 
For if the range is 0 to A we have 

£ x'dF < A r 

and hence 1 /til 1 ' > 1/A so that the series 2 1 /u) lT is divergent. 

A proof for the case when the frequency distribution is continuous has been 
given by L6vy, though on entirely different lines from the above. 

Theorem 5. A frequency distribution of infinite range is uniquely determined 
by its moments if it tends to zero at the infinite terminals faster than e~*. 

Consider first of all the case when only one end of the range is infinite, so 
that we may take the range to be 0 to ». 

If (uJn\) Un has a finite limit the distribution is unique, by Theorem 2. We 
have then only to consider the cases (if any) in which (un/n !) 1,B tends to infinity. 
It will be shown that in fact such cases do not occur. 

Given any (small) < there exists an X such that 



x > X 


where fix) is the distribution. Thus 

(9) J f(x)x n dx < < J e~*x n dx < «nl. 

This is true for all n and X is independent of n. Now, 

J f(x)x n dx — J f(x)x n dx + J f(x)x n dx. 

The first integral on the right is not greater than X n . The integral on the left 
tends, for large n, to something of greater order than n!, by our hypothesis, and 
hence to something of greater order than n n . This is of greater order than X n 
(since X, however large, is independent of n) and consequently the second in¬ 
tegral on the right is also of greater order than n\. But this is contrary to 
equation (9). 

The case for the range which is infinite in both directions may be dealt with 
similarly. 

It is easily seen that the two examples of equations (1) and (2) do not tend 
to infinity faster than e~ x . 

Except for the general result of Stieltjes, all the above criteria provide suffi¬ 
cient conditions, but whether the condition of Theorem 1 is also necessary is 
not certain. An inquiry into the circumstances in which the moment-series 
of Theorem 1 does not converge throws some light on the question. 

It will be remembered that the characteristic function always exists and is 
uniformly continuous in t. Since the moments of all orders are assumed to exist 
we always have 



PROBLEM OP MOMENTS 


409 


[>>]-<«'*• 

Thus, if 4>(t) can be expanded in an infinite Taylor series that series must be 
(it)' 

2 ■— fi r . And if this series does not converge then 4>(t) cannot be expanded 
r I 

as an infinite Taylor series. But it can always be expanded in the finite form 
with remainder 


^(0 = 22 Mr + R- 

r-0 T I 

Thus, when the series does not converge, <t>(t) can be expanded in powers of t 
only asymptotically. 

Now it is known that there exist an infinite number of functions which have 
a given set of coefficients in an asymptotic expansion; for instance, if \p(t) has 
an asymptotic expansion in t the functions t(t) + Xf 10 * * all have the same 
expansion. It is therefore hardly surprising that when the conditions of 
Theorem 1 break down there can be more than one frequency distribution with 
the same set of moments. 

But it does not follow from what has been said that there must be more 
than one frequency distribution. There must be more than one function, but 
those functions may not qualify as frequency distributions, e.g. they may be 
negative in part of the range. In the example just given f lot * cannot be a 
characteristic function, for it does not obey the well-known condition that <t>(t) 
and <t>(—t) should be conjugate. 

However, the question is more of mathematical than of statistical interest 
since the criteria provided above are likely to be adequate for the distributions 
encountered in practice. For example they establish the uhiqueness of the Pear¬ 
son curves (including the normal curve), the Poisson and the binomial. It 
would seem that distributions like those of equations (1) and (2) will appear 
only as statistical curiosities. 

REFERENCES 

[1] J. Stieltjes, Recherches sur lea fractions continues, Oeuvres, Groningen, 1918. 

[2] H. Hamburger, “Uber eine Erweiterung des Stietjesschen Momentproblema,’’ Math. 

Annalen, Vol. 81 (1920), p. 235, and Vol. 82 (1921), pp. 120 and 168. 

[3] P. LivY, Calcul des Probability, 1925, Paris. 

[4] T. Cahleman, Lea Fonctions Quaai-andlytiquea, 1925, Paris. 

[5] G. H. Hardy, J. E. Little wood and G. P6lya, Inequalities, 1934, Cambridge, England. 


London, England. 



ON SAMPLES FROM A NORMAL BIVARIATE POPULATION 

By C. T. Hsu 


1. Introduction. In a number of papers written during the last ten years, 
J. Neyman and E. S. Pearson 1 have discussed certain general principles under¬ 
lying the choice of tests of statistical hypotheses. They have suggested that 
any formal treatment of the subject requires in the first place the specification 
of (i) the hypothesis to be tested, say H 0 , (it) the admissible alternative hy¬ 
potheses. An appropriate test will then consist of a rule to be applied to ob¬ 
servational data, for rejecting Ho in such a way that (in) the risk of rejecting 
Ho when it is true is fixed at some desired value (e.g., 0.05 or 0.01), (iv) the risk 
of failing to reject H 0 when some one of the admissible alternatives is true is 
kept as small as possible. With these general principles in mind, they have 
investigated how best the condition (iv) may be satisfied in different classes of 
problems. In many cases, though not in all, it has been found that the condi¬ 
tions are satisfied by the test obtained from the use of what has been termed 
the likelihood ratio, [9], [10], [14]. Once the problem has been specified, the 
test criterion is usually very easily found, although its sampling distribution, 
if Ho is true, often presents great difficulties. In the present paper, I propose 
to use this method to obtain appropriate tests for a number of hypotheses con¬ 
cerning two normally correlated variables. The investigation was suggested 
by a recent application of the method by W. A. Morgan [6] to a problem origin¬ 
ally discussed by D. J. Finney [3]. 


2. The hypotheses and the appropriate criteria. A sample of two variables 
X\ and Xt is supposed to have been drawn at random from a normal bivariate 
population, with the distribution 

(1) P(I " 10 ' wi - a [C-^J 

where fc, fe, <t \, , and pu are the population parameters. 

Morgan tested the hypothesis that the variances of the two variables are 
equal, i.e., 


Hi: 


<ri = <ri 


1 See bibliography at the end of the paper. 

410 



NORMAL BIVARIATE POPULATION 


411 


Other hypotheses that will be considered in the present paper are as follows; 

Hi Assuming <ri = as ; to test pa — pa. 

H t Assuming o\ = ; to test £1 = £t. 

Hi To test simultaneously m = ot, pit = po . 

Hi To test simultaneously = <rt , £1 = H • 

H« Assuming <r\ = and £1 = £* ; to test pa — Pa • 

Hi Assuming <ri — <r t , and pa = po to test £1 = £». 


Derivation of the criteria. Let x i{ , xj< be the measurements of the two char¬ 
acters on the ith individual of the sample, then the joint elementary probability 
law of the two sets of » observations E * (*u, *u, • • • , *i» ; xn , x *, ... , 
**») is 


(2 > -Ko^SK^)' 




It will be convenient to denote by A, B, C, D, the following conditions of the 
population from which the sample is supposed to be drawn. 

(. A ) that stated in equation (1). 

( B ) that stated in the equation for Hi, namely 

<n = ff* = being unspecified). 

(C) £1 = £2 = £(£ being unspecified). 

(I>) Pa ^ Pa. 


Neyman and Pearson’s method affords a simple rule for obtaining appropriate 
test criteria once two sets of conditions have been defined. These are 
(o) the conditions which can be assumed to be satisfied in any case, and 
(6) the conditions which are satisfied if the hypothesis to be tested is true. 
The conditions (a) define a class ft of admissible populations, and the condi¬ 
tions ( b ) define a sub-class u of £2 to which the population must belong if the 
hypothesis tested be true. 

The maximum value of p(E | ft, £j , <ti , <rj , pa) when the parameters vary in 
such a way that the population sampled always belongs to ft, is called p(ft max.). 
The maximum value when the population is restricted to w is called p(« max.). 
The likelihood ratio for testing the hypothesis specifying the subset a has been 
defined to be 

\ = P (“ max.) 
p (ft max.) * 


( 3 ) 



412 


C. T. HSU 


It will be seen that 1 < X < 0. By referring X, or a monotonic function of X, 
to its sampling distribution when the hypothesis tested is true, we obtain a 
scale on which to assess our judgment of the truth of the hypothesis tested. 

For each of the hypotheses Hi to Hi , X of (3) can be found. However, we 
shall use a more convenient criterion. 


(4) 


L = X !/ " 


which is a monotonic function of X. 

Thus the respective test criteria are found to be: 
For Hi : 


(5) 


44*1(1 - rjt) 

(4 + 4)*d - R\) 


where Ri 
For H t : 

( 6 ) 

For H s : 

(7) 

For Hi: 

( 8 ) 

For Hi : 

(9) 

For H,: 

( 10 ) 


2r M sis*. 


4 + 4 


is the estimate of pn when oi and <r* are assumed to be equal. 


(1 — Po)(l ~ R\) 


(l - poRiT 


X. - l/(l + 

/ ( Si + 8j — 2risSiSjJ 

r 4(1-4)44(1-4) 


LtSS TJ 


(4 + 4) J (i - poRiY 
444(1 - 4,) 


= Li X U. 


{4 + 8* + J(£i — Xa) J }(l — fil) 

(1 - p*)(l - R\) 


— LiY.Lt. 


(1 - pORtY 


where Rt = is the estimate of p u when both the <r’s and 

4 + 4 + ifo - it)* 

the f’s are assumed to be equal. 

For Hi : 


(11) L,«l/{l+ « + »><*■ ~ *>‘ ] 

/ l 2(si — 2poris8i8j + 8j)J 

The different hypotheses are also given in Table V, at the end of this paper, 





NORMAL BIVARIATE POPULATION 


413 


together with the conditions defining sets of ft and «, and the appropriate likeli¬ 
hood criteria. 

To complete the solution we must find the distributions of L or some mono¬ 
tonic function of L in each case when the hypothesis tested is true, in order to 
assess the significance of an observed value of L. 


3. The distributions of the criteria. In order to simplify the problem of 
finding the distributions of the criteria, consider the following transformation: 

*w - (Xi - Yi)/y /2 

( 12 ) 

z* - (X, + F,)/V 2. 

It is clear that in view of (1) X and Y will be two normally correlated variables. 
We shall denote this property by A' corresponding to A. The conditions B', 
C', D' corresponding to B, C, D respectively are as follows: 

Bpxy = 0 , 


lx = 0, 


(when pxr 


where 


(13) Vo - 

1 — po 

Thus we have the equivalent hypotheses H[, H% • • • Hr corresponding to 
Hi, H t , Hr. The likelihood ratios Li , Lx • • • Lr may be determined in 
the same way as before, and, in view of the transformation (12), it will be 
seen that they are equal to L \, JL* • • • Lr respectively. 

The tests of the hypotheses Hi, Hi, Hi are now seen to be well known. 
The teat of H[ : p xr = 0 is the test for significance of a correlation coefficient, 
and the criterion L x becomes 

(14) Li = Xj'," - 1 - r \ r . 

This test has been dealt with by Morgan [6] and Pitman [15], and has been 
referred to above. 

The teat of Hi : a\/<r* x = 70 when p xr = 0 can be treated as an extension 
of Fisher’s 2 -test [5], since 70 is specified. If we write 

5k 1 + Ri Si + + 2ru8i8t 

S x 1 — Ri «* + «» — 2ru«i«* 

the test criterion L* of (6) may be written 


7o(l + u/7o)*‘ 

It is well known that if Hi is true, then 



414 


C. T. HStJ 


and the test appropriate to Hi and therefore of Hi is the associated 2 -test (2 = 
J log u/yo) with degrees of freedom f\ — /* = n — 1. It may be easily shown 
that the two values of u cutting off equal tail areas from the distribution p(u) 
will correspond to a single value of L 2 . 

The test of Hz : ( X = 0 when p XY = 0 is in the form of “Student’s” t test. 
If we write 

U*/ .< 9 2 1 2 a 

n 1 $x Si + S 2 — 2ri2Si«2 


it follows that the test criterion L% of (12) may be written 
m + j). 

But it is well known that if ix = 0, then 

(20) ?<l> - v^Ti r b ( » ^»] 0 + ^rT- 

The 5% or 1% points of significance of t may be obtained from Fisher’s stable 
[5] with degrees of freedom f = n — 1. 

The tests of Hi and Hs. We infer from (14), (16) and (19) that Li is a func¬ 
tion of r X r , Li a function of S r and S x , and L 3 a function of X and S x . It is 
clear that if r XY is distributed independently of Si and S r , then Li and Li are 
independent, i.e., 


( 21 ) 


p(Li, Li) — p(Li)p(Lt) 


and that if r xy is distributed independently of X and S x , then L\ and L» are 
independent, i.e., 

(22) p(Li , Li) = p(Li)p(Li). 

It is known that X, Y are independent of S x , Sr, r XY ; and in addition that 
r XY is distributed independently of S X , Sr if p XY = 0. Therefore, if H[ is 
true, then the relations (21) and (22) hold. Hence, knowing p(Li) and p(Li), 
a very simple transformation and integration gives p(Li). Similarly, the dis¬ 
tribution of Lt may be readily derived from those of L\ and L». 

But from the distribution of r XY when p XY = 0, by transformation (14), the 
distribution of L\ assuming H[ true is found to be 


If Ht is true, from (17), by transformation (16) we have 



NORMAL BIVARIATE POPULATION 


415 


(M) r{U> - 5i« r - 

Again, if is true, from (20), by transformation (19), we have 

which is the same as the distribution of Lt . Therefore by comparing (21) and 
(22) we see that the distribution of Lt when Hi is true will be exactly the same 
as that of />« when H[ is true. We shall therefore confine ourselves to the 
problem of obtaining the distribution of Li from those of L\ and Lt . 

Now 

<26> V(L> ' U) ~ BlM^2UlW^TU ]^ 

Applying the transformation 

La — Li Lt 

(27) 

Z = L t 

and integrating with respect to Z from 0 to 1, we obtain 

(28) p(Li) = *(« - 2)li (n - 4) , 0 < L« < 1. 

Thus we can construct the values of Li at the 5% and 1% levels for different 
values of n as given in Table I. 


TABLE I 

5% and 1% values of Li (or Lt) 


n 

8% 

1% 

5 

.1357 

.0464 

6 

.2509 

.1000 

7 

.3017 

.1585 

8 

.3684 

.2154 

9 

.4249 

.2683 

10 

.4729 

.3162 

12 

.5493 

.3981 

15 

.6307 

.4924 

20 

.7169 

.5995 

24 

.7616 

.6579 

30 

.8074 

.7197 

40 

.8541 

.7848 

60 

.9019 

.8532 

120 


.9249 

CO 


1.0000 







416 


0. T. HSU 


The test of Ht . In the case of testing = Yo o'x), assuming pxr and 

Px each to be zero, the likelihood estimate of <ri becomes 2X*/n or Si + X 2 . 
The distribution of this quantity is the same- as that of Si but with degrees of 
freedom n instead of n — 1. Therefore, by analogy with the previous result 
(17) used in testing H t , if we write 

roo'* n ^* — _ 1 + Rt 

{) " 2 X* sl + r 1 -Rt 

then the likelihood criterion of H «becomes 


*+2 


01) p(v T - •») - (fj (>+f.) • 

Hence the test appropriate to Ht is the associated z-test z = $ log j - -- - 

with fi = n — 1, ft — n. We can use the z-table as before. 

The test of Hi. Here we test whether (x = 0. It may be seen that Li is 
a function of X 2 / (Si + YoSi). Further, if we assume that pxr = 0 and also 

that <r* = y 0 Sx , then it will follow that 2(X — X) 2 and - 2(7 — 7) 2 are each 

Y° 

distributed independently as x<r 2 x with n — 1 degrees of freedom; and hence 
their sum is distributed as x 2 «x with 2n — 2 degrees of freedom. Also if = 0 
(and Hi is true) X will be distributed normally about zero with standard error 
<rxiy/n • Hence we may write 

(zo\ T. = i ^/|i j-?_\ 


2n - 2/ 


where 


. It / { MX - X)* + 2(7 - ?)7yo 
/ V n(2n - 2) 


and is distributed in accordance with “Student’s” distribution with 2» — 2 
degrees of freedom, 

' i /. . e \ H(Sb - 1) 


p(ti) - 


_ (i + -L-X***. 

— 2)] \ + 2n — 2/ 


' ' V2ir=2B[i i(2n - 2)) V ’ 2n - 2/ 

In terms of original variables 

/gg\ tl _ YoX* __( 1 + Po)(3i ~ &)* 

2n — 2 y 0 S x + Sr 2(sf — 2p 0 ru «i St + «*) 



NORMAL BIVARIATE POPULATION 


417 


4. Comparison of the fii-test and Rrttnt with the ru-test in cases where H% 

and Ht are true respectively. It will be noted that in the preceding discussion 
we have been concerned with three different tests of the hypothesis that pvt 
has some specified value po . When there is no information available regarding 
the means and standard deviations of x x and xt , the test is based on the sampling 
distribution of the ordinary product-moment coefficient . If it may be as¬ 
sumed that <Ti = <rj, then we have the estimate 


Ri 


2r u <i» 
8* + 8s 


If besides a x — at , it may also be assumed that & = &, then we have the 
estimate 

r. _ ~ ~ fe )* 

«! + *i + Kft - *)*' 


From the point of view of testing hypotheses, all these criteria r«, R x , Rt 
follow from the application of the likelihood ratio method. It will be noted 
that if a x = at , either the ru or the Ri test may be used. But, insofar as the 
likelihood principle is accepted, the latter should be regarded as the "better” 
test. Again, if a x — at and £i = &, all three tests may be used, but that based 
on Rt will be the “best”. A question of interest is to investigate just what is 
meant by the “better” or the “best” test. We may ask how far the improve¬ 
ments are sufficient to justify the use of the Ri and Rt tests in place of the more 
generally used r w test. One method of comparison is to examine what Neyman 
and Pearson [12] have termed the “power function” of the tests. 

For example, when testing the hypothesis that a parameter 0 has the value 
$o in the population sampled, the power of the test criterion T with regard to 
the alternative hypothesis that 0 — 0 X > 0 O is given by the expression 0(6i) = 
P{T > T„\0 = 0i| where T{ is the value of T at the level of significance a. 
This quantity 0(0) measures the chance that the test as specified will detect 
the fact that 6 = 0 O , i.e., the chance of rejecting the hypothesis when it is not 
true. A test whose power function is never less than that of any other test is 
termed the uniformly most powerful test. 

If the permissible alternative hypotheses to 0 = 0o are both 0 < do and 0 > Oo , 
then the power of the test T is given by the expression 

0(00 = 1 - p{T'. < T < T 1 :100 

where T' a and T a are the values of T at both ends of the distribution at the 
level of the significance a. When the test is such that the power function has 
a minimum value a at 0 = 0 O , it is said to be unbiased. 

A test is termed biased if, for certain alternative hypotheses 0 ^ 0o, the chance 
of rejecting the hypothesis 0 = 0o is less than the chance of rejecting this hy¬ 
pothesis when it is true. 



418 


C. T. HSU 


In what follows it is proposed to compare the power functions of the tests 
based on ru, Rt , and Rt in order to obtain more complete evidence of the 
extent to which one is “better” than the other. 

The distribution of Ri? We have obtained the distribution of n when Ht and 
therefore Ht is true. We are now able to find the distribution of Ri by apply¬ 
ing the transformation of (15). Thus the distribution of Ri in terms of po is 


(36) 


(1 - po) (1 - R \) i(n - l) 

^ llP0) 2"~*B[J(n - 1), }(» - 1)] (1 - Pofl.)- 1 ' 


The significance of Ri may be assessed by the z-test, Where we take 


(37) 


Z - 


1, u 1. 1 + R\ 

2 10g ‘fo = 2 10g f ^ 

= z' - f, say 



1 4~ po 

1 — po 


with degrees of freedom ft — ft = n — 1. R. A. Fisher’s z-table may be used 
in this connection. 

When pa — 0, the distribution simplifies to 


(38) 


**I" - ® “ ^[Xn - ay n - l )] (1 - ^ 

(i - /e*)* ( ”-» 


B[i(»-1),« 


since 2 2n_s B[$(n — 1), i(n — 1)] is equal to B[$(n — 1), $] by duplication for¬ 
mula [16, p. 240]. 

The distribution (38) is similar in form to that of p(r a | pu = 0) with n — 1 
degrees of freedom instead of n — 2. The significance levels of Ri may then 
be obtained directly from the r-table [1] for the case pa — 0, entering with 
degrees of freedom » — 1. 

The distribution of Rt. The distribution of Rt may be obtained from that of 
v when H' t and therefore Ht is true. It is 


(39) p(ftj|pu = po) 


(1 + po)*"(l - po)*"- 11 (1 + 

2"-*B[i(n - 1), in] (1 - poRj)" -4 


This agrees with the result first obtained by R. A. Fisher [4] by a different 
method. The significance of Rt may be assessed by the z-test, where we take 


(40) 



1 Since finding the distribution of Rt (36), (38) and the relation between Rt and t' (37), 
my attention has been drawn to a recent paper by DeLury (2] in which the same results 
are obtained. Since my method of derivation is different from his, I have thought it 
worthwhile to retain it here. 



NORMAL BIVARIATE POPULATION 


419 


with degrees of freedom fi - n — 1 , ft — n. The tables for use with the s-test 
may be used in this connection. 

When pu = 0, the distribution iB simplified to 

(4.) pCB.|p„ - 0) - ^ B[t(n 1 _ 1Un| (1 + 

which is simply a Pearson Type I curve. 

Power functions of Ri and Rt . In order to find the power functions of Ri and 
Rt with respect to alternative hypotheses H t to Ht , specifying p u * pt < Po , 
it will be convenient to consider the incomplete beta function distributions 


(42) 

(43) 
where Xi — 


p(*i) = 

p(xt) = 


u 


B[i(n - 1), i(n 
1 

B[i(n - 1), in] 
and x 2 — 


D1 


x[ in -*\l - Xi) 10 -® 


- x,)* ( "-® 

ft 

From the Tables of the In- 


7o(l + u/y 0 ) 7o(l + v/y„) ’ 

complete Beta Function [13] we can find the values of Xi and Xt at the significance 
level a, i.e. 


(44) I,i [i(n - 1), i(n - 1)] - a', 

(45) /.; [J(n - 1), in] - a'. 

The values of R[(a), and of Rt (a), may then be calculated from the relations 


(46) 


u — 1 _ — 1 -f- Xi j- 7 qXi 
u + 1 1 — Xi + 7 oXi * 


(47) 


V ~ 1 _ — 1 4* Xt + 70^8 
v + 1 1 — Xj + 7oXs 


The power functions of Ri and Rt thus found may be given as follows: 

(48) j8'(pi | Rt) - P{Ri < R[{a) | p 4 ], 

(49) f*'{p<\Rt) - P\Rz < «i(a)|p*}. 

In the same way, for any alternative hypothesis H t specifying pu = pt > Po, 
we can find the values of xi and xj at the significance level a”, at the other end 
of the distribution, i.e. 


(50) 1 - /.{• (i(n - 1), i(n - 1)] - a", 

(51) 1 -/^[i(»-l),W 

Thence the corresponding values of Ri(a) and Rt(a) may be obtained, and their 
power functions are 


(52) 


/3"(p t | RJ - P{R X > /tf(«) | Pl ], 



420 


C. T. HSU 


(53) /3"(p, I Rt) - > fti'(a) I p,J. 

The power functions of Ri and Rt with respect to alternative hypotheses speci¬ 
fying pis = pi < po and > po may now be obtained by adding (48) and (52) or 
(49) and (53) or, more simply, 

(54) d(p< | Ri) - 1 - P{fil(«) < Pi < «"(«) I *}, 

(55) J9(p, | Pi) - 1 - P{fli(a) < P, < #'(«) | p«} 

where Ri(a), Ri(a); Rt{a), Rt{a) are the values of Ri and Rt at the two ends 
of the distribution at the significance level a = a' + a 

In view of the fact that after transformation the tests based on Ri and Rt 
are equivalent to tests regarding the equality of variances, it follows from Ney- 
man and Pearson’s work [11] regarding the uniformly most powerful test of the 
hypothesis that a\jc\ = 70 , with alternatives c\/a\ — yt < 7o (or y t > 70 ), 
that: (1) if <ri = V* and alternative to pu = a 0 are that pn = pi < po (or, in a 
second case, p< > po) the test based on R\ is the uniformly most powerful test, 
i.e., it is more powerful than that based on ; and (2) if o\ = «r» and fi = , 

then the test based on Rt is the uniformly most powerful test, i.e., it is more 
powerful than those based on either r n or R ,. 

For illustration, let us take a special case, say 


(a) n = 10, po - 0.6, a' = a" = 0.025. 

From the tables, we obtain the values 

x[ = .198902 xt = .184863 

x" = .801098 x't = .772916 


and by calculation the values 

R[(a) = -.0034 
Ri(a) - .8831 


Rt(a) = -.0487 
R','(a) = .8632. 


The values of the power functions of Ri and Rt for specified values of p ( have 
been calculated and are given in Table II. For p, < po, a comparison of 
columns 2 and 4 will show that the test based on Rt is uniformly more powerful 
than that based on Ri (or for p, > po, a comparison of columns 3 and 5). 

The unbiased lest of Ht and Ht . When however the alternatives are that 
Pn = Pt < Po , and p, > po, questions of bias may be introduced. 

In the case of Ht , i.e. when R 1 is used, it was established by J. Neyman in 
his lecture courses [ 8 ], that if we test whether c\/a\ — 70 , where the alternatives 
are y t < 70 and 7< > 70 , and if the samples of X and Y are of equal size, then 
the test based on cutting off equal tail areas of the distribution of Xi is unbiased 
and of the type B [7]. Therefore the same may be said of the Pi-test. 

In the case of Ht , the equivalent transformed test is again whether a\/a\ = 
7 o. But the test now corresponds to that in which an estimate of <r* is based 



NORMAL BIVARIATE POPULATION 421 

on /i = n — 1 degrees of freedom and an estimate of a* on ft — n degrees of 
freedom. The degrees of freedom not being equal, it is known that if equal 
tail areas are cut off from the sampling distribution of x», this test will be 
biased. Neyman’s result [8] shows that if the lower and upper significance 
levels are taken at x'% and x%, then the equation 

(56) *;''*( i - *;y* - *; A a - 

should be satisfied if the test is unbiased. Since in the present case, with the 
test based on equal tail area critical region, the bias will be very small, the 
rejection levels 1?*(«) and Rt(a) in the numerical investigation given in Table 
III have been selected taking equal tail areas for simplicity. 

TABLE II 


Values of the power functions of Ri and Rt with respect to alternative hypotheses 

Pis = pi < po or pi > po 
(n = 10; po = 0.6; a’ = a" = 0.025) 


PI 

fl'(p.lBi) 

0*(p«|Bj) 

/J'fp.lBO 


-0.8 

.9984 




-0.6 

.9739 




-0.4 

.9867 




-0.2 

.7189 


.7360 


0.0 

.4960 

.0002 

.5093 

.0001 

0.2 

.2744 

.0008 

.2809 

.0006 

0.3 

.1825 

.0018 

.1860 

.0015 

0.4 

.1106 

.0042 

.1111 

.0037 

0.5 

.0576 

.0099 

.0580 

.0093 

0.6 

.025 

.025 

.025 

.025 

0.7 

.0081 

.0678 

.0080 

.0720 

0.8 

.0015 

.1995 

.0015 

.2150 

0.9 

.0001 

.5950 

.0001 

.6289 

0.95 


.8979 


.9150 

0.975 


.9866 


.9897 


If we now take a special case, similar to (a) above, but taking equal tail areas, 
so that 

» ** 10 p = 0.6 

a = 0.5 (a! * a" = Ja) 

we can obtain the values of x’s and of R’s as before. 

The values of the power functions of Ri and Rt for specified values of p« are 
given in columns 3 and 4 of Table III. These values are equivalent to the 
sums of the corresponding values in Table II. The values of the power func¬ 
tions of Ri and Rt for the following additional cases are also given in Table III: 









422 


(b) 

n = 10 

(c) 

n = 20 

(d) 

n — 20 


C. T. HSU 

po — 0.8 a — 0.05 

po = 0.6 a — 0.05 

Po = 0.8 a = 0.05. 


Comparison of the power functions. We may now deal with the question 
raised at the beginning of this section, namely, as to what is meant by the 
“better” or “best” test. We shall proceed to compare for certain special cases 
the power functions of the three test, all of which are applicable where it may 
be assumed that <ri = <r*, {i = (t . 

In the first place it will be noted that the power function of the test based on 
equal tail areas of the r t » distribution is 

(57) P(pt I r«) = 1 - p{y[t(a) < y tt < y"t(a) | p«} 

where 

p(ru | pu = po) drn = }a 

( 68 ) ' 

P{ru > ru(a) | po} = p(rn | Pu = Po) dru =* \a 

and 


(59) p(xis | pu 


Po) = 


(1 - pl) iin l) ( __ S U(n-4) ( d Y“ s cos- 1 (~por») 

*mn ~ 1)] U 12; W V(1 - Po r *o) 


The probability that r« is less than some specified value may be obtained from 
Tables of the Correlation Coefficient (F. N. David, [1]), or, where these are not 
sufficiently detailed, by using R. A. Fisher’s z’-transformation for m [4]. 

The cases considered are (a), (b ), (c), ( d ) as defined above. The power 
functions of the three different tests (all based upon the equal tail areas of their 
distributions) are given in Table III. The figures for r« in the brackets are 
those obtained by the z'-transformation approximation. 

An examination of Tables II and III brings out the following points: 

(1) For reasons given above, the Rt test based on equal tail area critical 
regions is very slightly biased; the amount of this bias for the case » *= 10, 
po = 0.6, a = 0.05 is shown in Table IV. This shows that the power of the Rt 
test is less than 0.05 in the fifth or sixth decimal places for 0.59 < p, < 0.60. 
As a result this test is very slightly less powerful than the other two tests for 
alternatives with p t slightly less than po. The effect is, however, of little im¬ 
portance. 

(2) Except in this short range of p ( , we find that 

0(p< | Rt) > j8(p< | Ri) > &(pt | ru). 



TABLE III 

Comparison of the power functions of r« , R t , and Rt tests with respect to alternative hypotheses 


NORMAL BIVARIATE POPULATION 


QO 

© 

II 

s 

a 

u 

& 

I 

3 

ca 

.9966 

.9698 

.8254 

.6520 

.4085 

.1635 

.0500 

.3604 

.8844 

.9960 

.5459 

.9101 

p-S 

a 

Jl 

«x 

.9959 

.9663 

.8170 

.6432 

.4011 

.1617 

.0500 

.3493 

.5871 

.9947 

.5613 

.9158 

1 

«x 

.9952 

.9624 

.8062 

.6309 

.3920 

.1589 

.0500 

.3272 

(.3270) 

(.8547) 

(.9944) 

.5671 

.9222 

n = 20 p, = 0.6 

a 

CL 

*X 

.9973 

.9698 

.8449 

.5534 

.2061 

.0922 

.0500 

.1147 

.4134 

.9181 

.9978 

.2041 

.8084 

a 

3 

<*x 

.9967 

.9663 

.8369 

.5456 

.2036 

.0917 

.0500 

.1119 

.4010 

.9106 

.9974 

.2253 

.8201 

/-N 

M 

w 

CL 

CQ. 

.9965 

.9648 

.8328 

.5412 

.2026 

.0915 

.0500 

.1096 

.3886 

.9034 

(.9004) 

(.9974) 

.2289 

.8300 

00 

o 

II 

£ 

o 

H 

II 

g 

/-s 

4 

cl 

w 

<*x 

.9921 

.9650 

.8909 

.7360 

.4877 

.3427 

.2047 

.0971 

.0500 

.1904 

.5763 

.8896 

.9938 

.3423 

.9368 

i 

a 

3 

.9891 

.9569 

.8766 

.7189 

.4750 

.3345 

.2010 

.0965 

.0500 

.1771 

.5426 

.8692 

.9908 

.3817 

.9463 

c 

CL 

.9887 

.9557 

.9742 

.7158 

.4727 

.3330 

.2005 

.0969 

.0500 

.1466 

(.1454) 

(.4689) 

(.8134) 

(.9845) 

.4004 

.9574 

CO 

o’ 

II 

$ 

o 

U 

g 

a 

3 

<tx 

.9807 

.9005 

.7360 

.5094 

.2815 

.1148 

.0673 

.0500 

.0800 

.2165 

.6290 

.9150 

.9897 

-.0487 

.8632 

a 

QL 
v—' 

ox 

.9739 

.8867 

.7189 

.4962 

.2752 

.1148 

.0675 

.0500 

.0759 

.2010 

.5951 

.8979 

.9866 

-.0034 

.8831 

2 

3 

ox 

.9739 

.8865 

.7186 

.4960 

.2753 

.1142 

.0679 

.0500 

.0735 

.1890 

.5656 

(.5569) 

(.8709) 

(.9822) 

8668* 

6800- 

CL 

LO 

lO I>* O 

05 fli 05 

ddddddddddd odd 

• i i 

Levels 


423 
















424 


C. T. HSU 


That is to say, the power function of the Rt test never lies below those of the 
Ri and r u tests, and that of the R t test never lies below that of the r u test. 

(3) The gain in sensitivity as measured by the chance that the test will 
detect that p< ^ po is, however, very small. Further, Ri may only be used if 
it is known that <ri «= at and Rt if it is known in addition that . It will 

only be in rather special problems that the statistician can feel confident that 
such assumptions are justified. We will therefore probably prefer the test based 
on the ordinary product moment correlation coefficient ru, since the slight loss 
in power will be felt to be outweighed by the gain in simplicity. It is, however, 
only after an objective comparison of the consequences of applying the three 
tests that a definite opinion on these points can be reached. 


TABLE IV 


p< 

fl'Ulff.) 

0"U| R,) 


0.5 

.0580 

.0093 

.0673 

0.590 

.0274235 

.0225806 

.0500041 

0.591 

.0271778 

.0228190 

.0499968 

0.592 

.0269359 

.0230578 

.0499937 

0.593 

.0266934 

.0232976 

.0499910 

0.594 

.0264515 

.0235337 

.0499852 

0.595 

.0262096 

.0237798 

.0499894 

0.596 

.0259677 

.0240222 

.0499899 

0.597 

.0257257 

.0242651 

.0499908 

0.598 

.0254838 

.0245107 

.0499945 

0.599 

.0252419 

.0247540 

.0499959 

0.6 

.025 

.025 

.05 


5. Summary. Various hypotheses relating to a population of two normal 
correlated variates have been considered and the appropriate test criteria for 
each hypothesis have been derived by the likelihood ratio method. The dis¬ 
tributions of the likelihood ratio criteria or of monotonic functions of them have 
been obtained with the aid of transformation (14). References have been given 
to tables from which significance levels for use in conjunction with the tests 
may be obtained; a new table of significance levels for the tests of Hi and Ht 
was given. 

The power functions of ru, Ri and Rt have been compared; from these power 
functions it was concluded that Ri and Rt are suitable respectively for testing 
the hypothesis when ai = <r t and when, in addition, ft = £*. 

In conclusion, I should like to express my indebtedness to Professor E. S. 
Pearson for continued advice and help in the preparation of this paper, to Dr. 
A. Wald and Professor S. S. Wilks for valuable suggestions. 





TABLE V 

Conditions defining fl and u together with the likelihood criteria appropriate for testing the hypotheses Hi 





































426 


C. T. HSU 


REFERENCES 

[1] F. N. David, Tables of the Correlation Coefficient, London: Biometrika Office, 1938. 

[2] D. B. DeLury, Ann. Math. Slat. Vol. 9 (1938) p. 149. 

[3] D. J. Finney, Biometrika, Vol. 30 (1938), p. 190. 

[4] R. A. Fisher, Matron, Vol. 1 (1921), p. 3. 

[5] R. A. Fisher, Statistical Methods for Research Workers, 7th ed. Edinburgh: Oliver 

Boyd, 1938. 

[6] W. A. Morgan, Biometrika, Vol. 31 (1939), p. 13. 

[7] J. Nbyman, Bull. Soc. Math. France, Vol. 63 (1935), p. 246. 

[8] J. Neyman, Lectures delivered in London, 1937-8, (Unpublished). 

[9J J. Neyman and E. S. Pearson, Biometrika, Vol. 20A (1928), p. 175. 

[10] J. Neyman and E. S. Pearson, Bull. Acad. Polonaise Sci. Lettres, A (1931), p. 460. 

[11] J. Neyman and E. S. Pearson, Phil. Trans., Ser. A, Vol. 236 (1933), p. 289. 

[12] J. Neyman and E, S. Pearson, Slat. Res. Mm. Vol. 1 (1936), p. 1. 

[13] K. Pearson (Editor), Tables of the Incomplete Beta-Function, London: Biometrika 

Office. 

[14] E. S. Pearson and J. Neyman, Bull. Acad. Polonaise Sci. Lettres, A (1930), p. 73. 

[15] J. C. Pitman, Biometrika, Vol. 31 (1939), p. 9. 

[16] E. T. Whittaker and G. N. Watson, Modern Analysis, 4th Edition (1927). 

University College, 

London, England. 



ON A LEAST SQUARES ADJUSTMENT OF A SAMPLED FREQUENCY 
TABLE WHEN THE EXPECTED MARGINAL TOTALS ARE KNOWN 

By W. Edwards Deming and Frederick F. Stephan 

1. Introduction. There are situations in sampling wherein the data fur¬ 
nished by the sample must be adjusted for consistency with data obtained from 
other sources or with deductions from established theory. For example, in the 
1940 census of population a problem of adjustment arises from the fact that 
although there will be a complete count of certain characters for the individuals 
in the population, considerations of efficiency will limit to a sample many of 
the cross-tabulations (joint distributions) of these characters. The tabulations 
of the sample will be used to estimate the results that would have been obtained 
from cross-tabulations of the entire population. 1 The situation is shown in 
Fig. 1 in parallel tables for the universe and for the sample. For the universe 
the marginal totals IV,•. and N.j are known, but not the cell frequencies Na ; 
for the sample, however, tabulation gives both the cell frequencies n<,- and the 
marginal totals and n.j . 

In estimating any cell frequency of the universe, such as N a , three possi¬ 
bilities present themselves; from the sample one may make an estimate from 
the ith row alone, another from the jth column alone, and still another from the 
over-all ratio «*,/«: specifically, the three estimates would be IV,•./»<., 
nuN. j/n.j , and n if N/n. As a result of sampling errors these will not be identical 
except by accident, and though any of them by itself may be considered ac¬ 
curate enough, still, if the whole r X s table of universe cell frequencies were so 
estimated, the marginal totals would not come out right. In this paper we 
present a rapid method of adjustment, which in effect combines all three of the 
estimates just mentioned, and at the same time enforces agreement with the 
marginal totals. The method is extended to varying degrees of cross-tabulation 
in three dimensions. 

In any problem of adjustment where the conditions are intricate it is neces¬ 
sary to have a method that is straight-forward and self-checking; this becomes 
imperative when we realize that in the three-dimensional Case VII of the 
problem now at hand (vide infra), any adjustment in one cell must be balanced 
by adjustments in at least seven others. The method of least squares is one 
possible procedure for effecting an adjustment and at the same time enforcing 
certain conditions among the marginal totals. It is essentially a scheme for 

1 Examples will occur in the 1940 census publications. Further discussion of this prob¬ 
lem and of the sampling procedure is given by the authors in “The sampling procedure 
of the 1940 population census," Jour. Am. Slat. Astn., Vol. 36 (1940), pp. 615-630. 

427 



428 


W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


arriving at a set of calculated or adjusted observations that will satisfy the 
conditions of the problem, and at the same time minimise the sum of 
the weighted squares of the residuals, symbolized as 

(1) S = 2 w(n c — wo) 2 

n e and no being the calculated and observed numbers in a cell, and n e — w<» the 
corresponding residual. It is the nature of the conditions imposed on the ad¬ 
justed values that distinguishes one type of problem from another. Least 
squares has the practical advantage of uniqueness, once the weights of the ob¬ 
servations have been assigned, and it possesses the theoretical dignity of giving 
one kind of “best” estimates under ideal conditions of sampling. For our 
present purpose we shall minimize sums of the form 

(2) S = - w,) 2 /w,- 

n i being the observed frequency in the ith cell, and m < the calculated or adjusted 
frequency therein. The conditions among the m, will arise from the fact that 
the marginal totals, after adjustment, must agree with their expected values, 
namely, the deflated marginal totals of the universe (for example, and m ., as 
defined in eqs. (6) and (7)). 

By definition, weight and variance are inversely proportional, hence the 
principle of least squares is identical with the minimizing of chi-square. Here 
the variance in the tth cell is v<( 1 — v,/n), where Vi is the expected number in 
that cell, and n is the total number in the sample. Now if Vi is sufficiently 
well approximated by Ui , it follows that if no cell contains an appreciable 
fraction of the whole sample (a circumstance requiring a fair sized number of 
cells—perhaps 100), the variance may be taken as vi for every i, and the mini¬ 
mized S can be used as chi-square. But regardless of the number of cells, if 
the w i be not too much different from one another, so that the factor 1 — V{/n 
may be treated as a constant, we still get the least squares solution by minimiz¬ 
ing S as defined in eq. (2). 

2. The two dimensional problem. Suppose that the data on two character¬ 
istics (e.g. age and highest grade of school completed) are obtained for each 
member of a universe of N individuals, and that tabulations of the data provide 
either (a) one set of marginal totals Ni. , N %., • • • , N r . ; or (6) in addition, the 
marginal totals N.i , N.t , • • • , N. t . The nature of the tabulations is presumed 
such that it is not feasible to count the numbers Na in the cells, as would be 
done if one character were crossed with the other. Suppose, however, that for 
a sample of n individuals selected in a random manner from the universe, the 
two characters are crossed with each other, so that we know not only all the 
8 + r marginal totals n.i, • • • , n r . of the sample but also the numbers 
(i = 1, 2, . • • , r; j = 1, 2, • • * , $). The problem is to estimate the unknown 
frequencies in the cells of the universe. This will be done by finding the 
calculated or adjusted sample frequencies ma and then inflating them by the 
inverse sampling ratio N/n. 



A LEAST SQUARES ADJUSTMENT 


429 


For the least squares solution we seek those values of t»</ that minimise* 

(3) S «■ Z(m</ — n</)*/»</ 

wherein the m</ are subjected to one of the following sets of conditions: 

Cose I : One set of marginal totals known. Assume N \., Nt., •• • , N r . to be 
known. Then we require 

(4) ^2 mu = mi., t - 1, 2, ..., r. 

i 

These r equations constitute r conditions on the adjusted m a. 


universe: 

j* 


SAMPLE 


!•/ 

N, 

Nt 


N,s 

N,. 

n„ 

n a 



n,. 

Ut 

Nu 

Nn 


Nts 

Nt. 

n t ,\ 

n u 


Nts 

n t . 




Nu 

: • : 

Ni. 



mj 

L_ ... _ 


m. 

r 

N r , 

Nrz 


Nrs 

Nr- 

n r , 

n n 


n r $ 

n r . 


n, 

N.t 

... N.j ... 

Ns 

N 

n., 

n. t 

— rij ••• 

n. s n 


Nil unknown nu known 

Marginal totals N.i and Ni. known Marginal totals n.i and nt. known 

N known n known 

Fig. 1. Showing the System of Notation for the Cell Frequencies and Marginal 
Totals or the Universe and the Sample in the Two Dimensional Problem 

Case II: Both sets of marginal totals known. Here the adjusted cell frequencies 
must satisfy not only condition (4) but also 

(5) £ my = m.f j = 1, 2, • • •, s — 1 

i 

there being now a total of r + 8 — 1 conditions. In both cases, 

(6) m<. = Ni.n/N, 

(7) m.i - N.in/N. 

In other words, m<. and to.,- are the deflated marginal totals, i.e., Nt. and N.i 
divided by the actual sampling ratio N/n. The mi, and m. # are not independent, 
for 


* The sign ^11 denote summation over all possible cells, unless otherwise noted. 
Yl will denote summation over all values of t, and similarly for an inferior j or k. The 

i 

dot, as in n.i , will signify the result of summing the n<, over all values of i in the jth 
column. 



430 


W. EDWARDS DBMING AND FREDERICK 7. STEPHAN 


(8) N.x + N. t + ... +N.. = Ni. + N,. + ...+N,. - 

It is for this reason that if i runs through all r values in eq. (4), then j can run 
through only 8 — I in eq. (5). A similar equation also exists for the marginal 
totals of the sample, namely, 

(9) n .i 4~ n.i 4* • • • 4- n., — Bi, 4- nt. 4* • • • 4“ Hr. = n. 

Solution of the two dimensional Case I. Assuming that the adjusted values 
of the ntij have been found, let each take on a small variation {m<,-; then the 
differentials of eqs. (3) and (4) show that 

(10) = 2{(w<y - = 0 (one equation), 

(11) 2 = 0, t = 1, 2, • • •, r (r equations). 

i 

Multiply now eq. (lit) by the arbitrary Lagrange multiplier — X<., and add eqs. 
(10) and (11) to obtain 

(12) 2{(m,/— »<,)/n,/— *= 0. (one equation). 

By the usual argument, one may now set each brace equal to zero, recognizing 
that the r Lagrange multipliers are then no longer arbitrary but must satisfy 
the relation 

(13) mu = rii,( 1 4- X<_). 

The adjusted frequencies m<y can be computed at once as soon as the X,. are 
found. To evaluate them one may rewrite the conditions (4) using the right- 
hand member of (13) for wi<,-, obtaining 

(14) mi. = n<.(l 4- X<.). 

Another way to arrive at this same relation is to sum each member of eq. (13) 
in the ith row. However obtained X<. is now known, since m,-. and n,-. are 
known, and in fact eq. (13) now gives 

(15) mu = n<y(m<./n<.)- 

The adjustment is thus a simple proportionate one by rows, the cells in any one 
row all being raised or lowered by the proportionate adjustment in the row total. 
Case I thus amounts to r independent one dimensional proportionate adjust¬ 
ments, one for each row, and any one or all may be carried out, as desired. 
This result can be obtained by a simpler approach but is presented in this way 
for consistency with later cases. 

The minimized sum of squares may be computed directly, or from the row 
totals by seeing that 

(16) S = £ (m, - n { .Y/ni.. 

i 

The term (m<. — «<.)*/«<. for the t'th row may be considered separately, and 



A LEAST SQUABES ADJUSTMENT 


431 


used as x J with 8—1 degrees of freedom, or all rows may be combined into 
the minimized S as given in eq. (16), and used as x* with r(s — 1) degrees of 
freedom. 

Solution of the two dimensional Case II. In addition to eqs. (11) we now 
have also 

(17) £ 5m,, = 0 j * 1, 2, • •., 8 — 1 

which comes by differentiating eqs. (5). By addition of eqs. (10), (11), and 

(17) , after multiplying eq. (Ilf) by — A,. and eq. (17 j) by —A./, we obtain 

(18) 2{(m<, - n<,)/n,-,- - A<. - A.,}5m,-,- = 0. 

Equating each brace to zero, as before, we find that 

(19) m,-, = »,-,(l + A<. + A.,) 

wherein A., is to be counted 0. The adjustment is now no longer proportionate 
by rows, but involves every cell. 

To evaluate the Lagrange multipliers in eq. (19) we may sum the two members 
downward and across in Fig. 1 and obtain the r + s — 1 normal equations 

ft*. + 2 M</A = wk. — ra,-., i = 1, 2, • • •, r 

( 20 ) ~ 

Z* fltfAj. + n.fX.j — m., — n./, j = 1,2, • • •, 8 — 1. 
< 

These can be reduced for numerical computation. The top row solved for 
A,-, gives 

(21) A i. = (l/n,-.){m,-. - £n<,A.,} - 1 

i 

whereupon by substitution into the bottom row of eqs. (20) we arrive at the 
8—1 normal equations 



A.i 

X.2 • 

• • X.«-i 

= 1 


n.i- 

y nuUn 


__ ynnni, $ -i 

= m. i— 

ynami. 


i Tk' 

i nt. 

i Ui. 


i Ui. 



v^nana 
n .»— 2-j „ 

_ y 

= m, 2 — 

yn*mi. 



< rk. 

i rii. 


% . 

(22) 



t 

• 

• 


• 

• 

• 





* m. 9 -i 




♦ 

i nt. 


i ni. 


0 . 


Because of symmetry in the coefficients, those below the diagonal are not shown, 
indeed, in a systematic computation, they are not used. The 0 in the bottom 



432 


W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


row is appended for the computation of the minimized S f if desired. The 
number of Lagrange multipliers to be solved for directly is s — 1, and the 
remaining ones come by substitution into eq. (21), X.. being counted 0. 

A simple procedure for calculating the coefficients in the normal equations 
(22) is to set up a preparatory table by dividing each in the ith row by VnT ; 
also to write down for that row, for use on the right-hand side of the 

normal equations (compare Tables I and II). In machine calculation the con¬ 
stant divisor VnZ would be left on the keyboard until the entire ith row is 
divided; or, if reciprocal multiplication is preferred, the multiplier 1/y/n^, would 
be left on. From this preparatory table, the cumulation of squares and cross- 
products in the vertical gives the required summations for the coefficients. The 
sum check would be applied in the usual manner. 


3. A numerical example of the two dimensional Case II. The fact is that 
in practice one need not bother about forming and solving the normal equations 
because they will be displaced by a simplifying iterative procedure, to be ex¬ 
plained in a later section. For illustration, however, we may do an example 
both ways, first using the normal equations and the adjustment (19), later on 
accomplishing the same results by the quicker method. 

We may start with the unitalicized numbers in the 4X6 array of Table I, 
assuming these to be the sampling frequencies n<y to be adjusted. Actually, 
they were obtained by deflating l/20th (for a supposed 5 per cent sample) the 
New England age X state table on p. 1108 of vol. 2 of the Fifteenth Census of 
the U . S ., 1930, then varying the deflated values by chance with TippettV 
numbers to get our sampling frequencies n<y. The italicized entries in Table I 
represent the final (adjusted) mu , and it is these that we now set out to get. 
We start off with the sampling frequencies and the known marginal totals 
m.i, m.a, etc., where m*. = Ni.n/N , m.j = N.fn/N , as in eqs. (6) and (7). 
The Lagrange multipliers shown along the left-hand and top borders arise in the 
calculations now to be undertaken. 

Table II is the preparatory table, advised at the close of the last section. It 
is derived from Table I by dividing the ith row of sample frequencies by y/fy ,. 
For example, the entry 8.64 in the cell i = 3, j = 2 comes by dividing 419 by 
\/2362, 419 being the entry in the cell of the same indices in Table I, and 2352 
being the sum of the third row. The sums at the bottom and right-hand side 
are for checking the formation of the normal equations. The cumulations of 
squares and cross-products along the vertical give the summations required for 
the normal eqs. (22), which now appear numerically as eqs. (23). 


No. X.i A.* 
1 

(23) 2 

3 

4 


- 3197 X 10 -i 

- 2356 

« -3222 

0 


7413 - 3549 - 2354 

4441 -544 

3129 



A LEAST SQUARES ADJUSTMENT 


433 


Performing the solution by any favorite procedure one will obtain 
(24) X.i - .01182 X., = .01490 X., =,.00119 

TABLE I 

A table of artificial sample frequencies, an artificial 5 percent sample of native 
white persons of native white parentage attending school, by age by state, New 
England, 19S0. The adjusted frequency mu in each cell is shown italicized 
just below the corresponding sample frequency n< } - 


Age 




ES 




j " 

1 

2 

3 

1 

tk. 



Ki - 

.0118 

.0149 

.0012 

0 

mt. 

State 

i 

X,. 




II 


Maine 

i 


3623 

781 

557 


5274 




S61S 

781 

550 


6252 

New Hampshire 




395 

251 

(1 

2371 




1588 

401 

251 

Ed 

2395 

Vermont 



1553 

419 

264 

116 





1608 

435 

270 

119 

1 

Massachusetts 

4 



2455 

1706 


15859 




10492 

2452 

1680 

1141 

15766 

Rhode Island 


- .0230 

1681 

353 

171 

154 

2359 




1662 

350 

167 

150 

2330 

Connecticut 

6 

-.0034 

3882 

857 

544 

339 

5622 




3916 

867 

643 

338 

5662 



n.i 

22847 

5260 

3493 

2237 

33837 



m.j 

22877 

5285 

3462 

2213 

33837 


The adjusted mu (italicized) are rounded off, hence when summed may occasionally 
disagree a unit or so with the expected marginal totals (also italicized), the latter arise 
by deflation from the universe rather than by direct addition of the nui . 


whereupon by substitution into eq. (21) comes 

Xi. = -.0146 X 4 . = -.0162 

(25) X*. = -.0003 X 6 . = -.0230 

X,. = +.0234 X,. = -.0034. 

The next step is to compute the m,/ by eq. (19). Table I is now bordered 
with the Lagrange multipliers for a convenient arrangement of the factors 
required, and the calculation is completed. It will be noted that, for example 





































434 


W. EDWARDS DEIUNQ AND FREDERICK F. STEPHAN 


(26) mn = 419(1 + .0234 + .0149) = 435. 

The mu thus calculated are shown italicized in Table I. The marginal totals, 
found by adding the m,, just calculated, do not agree exactly everywhere with 
the expected totals, because of rounding off to integers: the errors of closure, 
however, are slight, and it is a simple matter to raise or lower some of the larger 
cells by a unit or two to force exact satisfaction of the conditions, if this is 
desired. 

4. The three dimensional problem. Here the N cards of the universe are 
sorted and counted for one and perhaps a second and third characteristic, and 
possibly crossed by pairs in various combinations (Cases I-VII). The sample 
of n, however, is crossed by all three characteristics, which is to say that the 

TABLE II 

This comes by dividing each sample frequency in Table, I by the corresponding y/ n t . 
(This operation would ordinarily be done a row at a time) 



1 * 

nn./y/rii. 

Sum 


1 

2 

3 

4 

i = 1 

49.89 

10.75 

7.67 

4.31 

72.32 

144.94 

2 

32.24 

8.11 

5.15 

3.18 

49.19 

97.87 

3 

32.02 

8.64 

5.44 

2.39 

50.15 

98.64 

4 

83.68 

19.49 

13.55 

9.21 

125.19 

251.12 

5 

34.61 

7.27 

3.52 

3.17 

47.97 

96.54 

6 

51.77 

11.43 

7.26 

4.52 

75.51 

150.49 

Sum 

284.21 

65.69 

42.59 

26.78 

420.33 

839.60 


cell frequencies «,,* are all known (refer to Fig. 2). As before, the adjusted 
frequencies are required. 

Case I: One set of slice totals known. Assume the slice totals Nu ., Ni .., 
• • • , N r .. to be known; the conditions are then 

(27) X nut = m { .. = Ni..n/N i = 1, 2, • • • r 

ik 

being r in number. The summation to be minimized is 

(28) S — 2(wit,'i “ nijk) /nijk 

being similar to that in eq. (3), except that now there are three indices to be 
summed over instead of two. Following a procedure similar to that used before, 
we differentiate eqs. (27) and (28) and introduce the r Lagrange multipliers X<. 




A LEAST SQUARES ADJUSTMENT 


435 


with eq. (27). The steps are identical with those of the two dimensional Case I, 
and the result is at once 

(29) rank = n<,*(l + X<„) = n</t(m<../n<..). 

This adjustment, like that shown by cq. (15), is a simple proportionate one, but 
this time by slices rather than by columns. All cell frequencies having the same 
i index are raised or lowered in the same proportion. 



Fig. 2. Showing the System of Notation for the Cell Frequencies and Marginal 
Totals in the Three Dimensional Sample 

Case II : Two sets of slice totals known. Here, in addition to the slice totals 
of Case I we know also 

... ,N... 

whence arise the s — 1 additional conditions 
(30) 2 Mine = m.i. = N.j.n/N, j = 1, 2, 

ik 




• — 1. 











436 


W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


Using the Lagrange multiplier X.,-. here, and X,-.. with eq. (27) as before, we 
find that 

(31) mm, — «</*(l + Xj,. + X.j.) 

in which X.,. is to be counted zero. This adjustment is proportionate by tubes, 
the ratio being constant along the ijth tube and in fact equal to 

mti./rtij. , independent of k. Unfortunately we do not here know the face totals 
and are unable to make use of the proportionality as we shall in Case IV. 
To solve for the r -f « — 1 Lagrange multipliers we sum the members of eq. 

(31) over j and then over i and arrive at the normal equations 

n<..X<.. + 52 n«.X.,. = nix.. - n„., t = 1, 2, • • •, r, 

(32) _ 

Lj ny. Xj.. + n.j. X.,. = m.j. — n.j. , j = 1, 2, • • •, s — 1. 

t 

These can be reduced to s — 1 equations in precisely the same way that eqs. 
(20) were reduced, but because of the iterative process to come further on, we 
shall not pursue the reduction here. 

Case III: All three sets of slice totals known. All slice totals 


N. i., N.t ., ■ • • , N... 

Nu. , N,.. , • • • , Nr.. 

N .. k , N..2 , .. • , N.. t 

now being known, in addition to conditions (27) and (30) we require here 
(33) 52 = m.. h = N.. k n/N, k = 1, 2, .. •, t - 1 

ij 

which makes a total of r + (s — 1) + (t — 1) or r + s + t — 2 conditions* 
The same kind of manipulation as used heretofore gives 


(34) 


mijk — nnk( 1 + X,-.. + X./. + X„i) 


with X.,. and X..« to be counted zero. The adjustment is no longer propor¬ 
tionate by slices or tubes, but involves every cell. In practice, once the normal 
equations are solved and the Lagrange multipliers worked out, one proceeds 
very much as in the two dimensional Case II: for each of the t slices, corre¬ 
sponding to the t values of k, there will be a two dimensional adjustment, the 
1 in eq. (19) being replaced now by 1 + X..*. 

The normal equations for the Lagrange multipliers can be found by per¬ 
forming double summations on eq. (34). The result is 


s,'„X,,, 4* n,j. X.y. 4* , 

i * 

(35) 2 »,y.Xi.. + n./.X.y. + 2 n, ih \.k - m.j. - n.j., 

i k 


i ■ 1, 2, .. •, r, 

3 888 2 , • • • ) 8 1 , 


£ ft».*Xi.. + 2 n t jk\. 3 \ + n.^X..* = m..* — n.. 

i i 


k - 1, 2, * * *, t - 1. 



A LEAST SQUARES ADJUSTMENT 


.437 


If these calculations were to be carried out, one would simplify the computation 
by solving the top row for X<„ , getting 

(36) X,\. = - 23»<.*X..*} - 1 

i * 

and then substituting this into the middle and last rows of eqs. (35) to get a 
reduced set of s + t — 2 normal equations for the Lagrange multipliers X.,-. 
and \..k , the numerical values of which when set back into eq. (36) give the X<„. 
In all the summations of eqs. (35) and (36), X.,. and X..< would be counted zero. 
But here again, the iterative process to be explained later will displace the use 
of normal equations, so actually we are not interested in reducing them. 

Case IV: One set of face totals known. It may be that the rs face totals 

H 11. , -V12, , • • • , N i, * • • , Nrt. 

are known from crossing the i and j characters in the universe. The conditions 
are then 


(37) £ mat ** nui. = Na.n/N 

k 

The adjustment here turns out to be 


i 1,2, ■ • •, r, 
j - 1, 2, • •., s. 


(38) rriijk — »</*( 1 + X,-,-.); 

but by summing both sides over the index k to evaluate X,-,-. it is seen that 


(39) 

W»„. « n,y.(l + X,,-.), 

whence 


(40) 

mm - n,-,*(m<,•./«<,•.). 


This adjustment is thus proportionate by tubes, like that in eq. (31), though 
here the factor m,,is known and eq. (40) can be applied at once. 

Case V: One set of face totals, and one set of slice totals known. Sometimes, in 
addition to the rs face totals of Case IV, the slice totals 


N.. lt N..t, ••• ,N.. t 


will also be known, in which circumstances the conditions (37) are to be accom¬ 
panied by 

(41) 2 mtjk * m..» = N.jn/N, k * 1, 2, •. •, t - 1. 

The same procedure as previously applied yields now 

(42) mijk *= ft< } -*( 1 + X<y. + X..t) 

with X..c to be counted zero. Summations performed over k, and then over t 
and j together, give the normal equations 



438 


W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


Uij. X»y. + 2 Wtj*X..* = Vtij. — Uij. , 

(43) _ * 

2^ n ak X»y. + ft..*X..* = — n..*. 

The number of equations is r$ + $ —■ 1, since X..< does not exist. As before, 
a simplification can be effected by solving the top row for X*y. and making a 
substitution into the lower one, but because of the great advantage of the 
iterative process to be seen further on, we shall not carry out the reduction. 

Before going on it might be noted that although this case is three dimensional, 
it reduces to the two dimensional Case II if one considers that ij . is one index 
running through the values 11, 12, • •• , 21, 22, • • • , rs, and that . .k is a second 
index running through the values 1, 2, • • • , t. This can be seen by the simi¬ 
larity between eqs. (43) and (20). 

Case VI: Two sets of face totals known . If in addition to the face totals of 
Case IV, the face totals 

N. ii, N. 12, • • • , N, B t 

are also known from further crossing the j and k characters in the universe, we 
shall require 


■ r—^ J Aj 4* J ' • } V J 

(44) zLwijk = m.jk = N.jkn/N, 

< k = 1, 2, • • , t — 1 

in addition to the conditions (37). In place of eq. (40) of Case IV we now 
find that 

(45) niijk = n%jk{\ *4* Xyy. “4 X.yj;) 


in which X,y* is to be counted zero for all j . No simple relation such as eq. (40) 
is possible here, because the adjustment is not proportionate by tubes; the 
Lagrange multipliers must be evaluated. This can be accomplished by summing 
the members of eq. (45) over k and i in turn, resulting in the normal equations 


(46) 


n%j. Xfy. “4 2 ftijk X.y/fe — Mi], Hi }., 

k 


n%jk X»y. "4 W'.yfcX.yfc — m.jk n,jk * 
i 


Since X.y* does not exist for any values of j , the number of equations is 
rs + s(t — 1) = s(r + t — 1). They break up at once into $ sets each of 
r + t — 1 equations, one set for every j value. In fact, the problem can be 
considered as s sets of the two dimensional Case II. Any one value of j gives 
a slice, which can be looked upon as fulfilling the specifications of the two 
dimensional Case II. Each set of normal equations can be reduced in the same 
manner that eqs. (20) were reduced. 

Case VII : All three sets of face totals known . All totals now being known, 
we require 



A LEAST SQUARES ADJUSTMENT 


439 


(37) 

£ mat = m if . - Nn.n/N, 

k 

i - 1, 2, 

3 - 1. 2, 

(44) 

£ Mi* * m.jk = N. ik n/N, 

i 

3 - 1. 2, 

ft “ 1, 2, 

(47) 

£ mijk = mi.k = Ni.kti/N, 

i 

i - 1, 2, 

ft = 1, 2, 


»•# 




• 8 , 

• * * f 

•••ft — 1, 

• • • » r — If 

• • • , t — 1 , 


The adjusting relation is 

(48) rriijk = ni/*(l + Xy,\ + X.,* + X*,*) 

in which X.,* is to be counted zero for any j, X r .* for any and X».< for any t. 
The normal equations for the Lagrange multipliers are 

na.Xij. + 2 + 2 n ijk \i. k = m,-,. - »<,. 

A A) 

(49) 2 + n.i k \.,k + 2 nakKk = m. } * - ».,* 

< < 

2 n nk^a. + £ nijkX.jk + = to,-.* — n<.4 

y » 

being rs + r/ + s< — r — 8 — < + 1 in number. They can be reduced in the 
same way that previous normal equations have been reduced; but here again, 
the iterative process will render the use of normal equations unnecessary, except 
for theoretical purposes, e.g. justification of the iterative process. 

6. A simplified procedure—iterative proportions. It is well known in least 
squares that the number of Lagrange multipliers in any problem is equal to the 
number of conditions imposed on the adjustment. Here the conditions have 
appeared in sets, depending on which marginal totals are involved. By a com¬ 
parison of eqs. (15) and (29) on the one hand, with eqs. (19), (31), (34), (42), 
(45), and (48) on the other, we see that wherever there was only one set of 
marginal totals involved we came out with a proportionate adjustment, but 
that in all other cases it was not so; the Lagrange multipliers involved were 
unfortunately related to one another through normal equations. We now make 
the observation, however, that as a first approximation the adjustments may 
all be considered proportionate, and we shall be able to lyrite down an expression 
for the error in this approximation, and shall be able to eliminate it by a suc¬ 
cession of proportionate adjustments. 

Take the two dimensional Case II for an example. In eq. (21) one may 

recognize (1/n,-.) Yh n«/X.,- as a weighted average of X.,- for the t’th row. There 
i 

will be a weighted average of X.,- for the first row, another for the second, etc., 
one for each value of i; consequently one may appropriately speak of the t'th 




440 


W. EDWARDS DOMING AND FREDERICK F. STEPHAN 


average of X.,-, writing it i- av. X.,-. Substituting from eq. (21) into (19) one 
then sees the adjustment (19) appear as 

(50) mu — nij{m,i./ni. + X., — t-av. X.,). 

If, on the other hand, X.,- had been eliminated from eqs. (20), instead of X<., 
the result would have been 

(51) m it - = »,y(m.,/n.,- + X,-. - j- av. X,-.). 

From either eq. (50) or (51) it is clear why the adjustment (19) is not propor¬ 
tionate by rows or columns, and why Case II does not break up into r or s sets 
of Case I: the reason is that X., in any cell is not necessarily equal to the average 
X.y for that row, nor is X<. in any cell necessarily equal to the average X<. for 
that column. If nevertheless one were to make the simple proportionate 
adjustment 

(52) m'u = nn(mjni .) 

along the horizontal in the t'th row, the horizontal conditions (4) will be en¬ 
forced but not the vertical ones (5); i.e., it will be found that ml. = mi. , but 
that usually not all m.y = m.,-. This is because eq. (52) effects only a partial 
adjustment, each m'a being in error through the disparity between theX., proper 
to the jth column, and the average of all the X.,- for the ith row, as seen in 
eq. (50). This error can then be diminished by turning the process around and 
subjecting these m'u to a proportionate adjustment in the vertical according to 
the equation 

(53) m"j = mlj(m.,/m! f ) 

which may be considered an application of eq. (51) wherein the disparity be¬ 
tween any X,-. and the average X,-. for the jth column has been neglected. It is 
the vertical conditions that will now be found satisfied, but perhaps not all of 
the horizontal ones, because some of the row totals may have been disturbed. 
The cycle initiated by eq. (52) is therefore repeated, and the process is con¬ 
tinued until the table reproduces itself and becomes rigid with the satisfaction 
of all the conditions, both horizontal and vertical. The final results coincide 
with the least squares solution, which is thus accomplished without the use of 
normal equations. 

Usually two cycles suffice. In practice the work proceeds rapidly, requiring 
only about one-seventh as much time as setting up the normal equations and 
solving them. The tables III-V show the various stages of the work when 
the method of iterative proportions is applied to the sample frequencies of 
Table I. It will be noticed that the results of the third approximation (Table V) 
are final, since if the process were continued, the table would only reproduce 
itself. 

The same process can be extended to three or more dimensions with an even 
greater relative saving in time. To see how the method of iterative proportions 



A LEAST SQUARES ADJUSTMENT 


441 


applies in one of the three dimensional cases, we may go back to Case III. By 
the substitution afforded through eq. (36) the adjusting eq. (34) may be put 
into the form 


TABLE III 

The method of iterative proportions applied to the data of Table I. First stage: 
A proportionate adjustment by rows by eq. {52). Note that ml. = m<., 

but that m'.j ^ m.j 



j - 1 

2 

3 

4 

__ 

m 

i = 1 


778 

555 

312 

5263 

5252 

2 

1586 

399 

254 

157 

2396 

2395 

3 

mSm 

433 

273 


2432 

2432 

4 


2441 

1696 

1153 

15766 

15766 

5 


349 

169 

152 

2330 

2330 

6 


863 

548 

341 

5662 

5662 

m.j 

22846 

5263 

3495 

2235 

33839 


m.j 

22877 

5285 

3462 

2213 


33837 


TABLE IV 

A continuation of the process initiated in Table III. The figures in Table III 
are now adjusted proportionately by columns according to eq. (53). The vertical 
totals m" and m.j now are equal, but the agreement of the horizontal totals 
accomplished in Table III has been slightly disturbed 



j -1 

2 

3 

4 

ft 

mi. 

m%. 

i = 1 

3613 

781 

550 

309 

5253 

5252 

2 

1588 

401 

252 

155 

2396 

2395 

3 

1608 

435 

270 

119 

2432 

2432 

4 

10490 

2451 

1680 

1142 

15763 

15766 

5 

1662 

350 

167 

151 

2330 

2330 

6 

3915 

867 

543 

338 

5663 

5662 

ff 

m.j 

22876 

5285 

3462 

2214 

33837 


m.j 

22877 

5285 

3462 

2213 


33837 


(64) mm, - n<y*(TOi../n<.. + A.,-. + A..* - i-av. A.,. - t'-av. A..*). 
Equally well it could have been written 

(66) mm, - nwfm.j./n.i. + A,-.. + A..* - j-av. A,.. - j- av. A..*), 


or 






































442 


W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


(66) m iik = nut(m..k/n.. k + X<.. + X.,\ - h- av. X<.. - fc-av. X./.). 

Any of these three equations shows why the adjustment (34) is not propor¬ 
tional by slices, and why this case does not break up into r or s or t sets of the 
three dimensional Case I. As a first approximation it does, as is now clear 
from these three equations, and by making successive proportionate adjust¬ 
ments we may thus arrive at the least squares values. To go about the work 
we could first calculate the values of 


(57) 

m’ijk = nijk(mi../ni..) 

then 


(68) 

m[jk — m’ijk(m.j./m!j .) 


TABLE V 

The cycle is commenced again. The figures of Table IV are subjected to a propor¬ 
tionate adjustment by rows, according to eg. (52). And since these results turn 
out to be almost a reproduction of Table IV but with both horizontal and vertical 
conditions satisfied, they are considered final. The agreement with the m„ in 
Table I should be noted 



i - 1 

2 

3 

4 

t 

mi. 

rm. 

i = 1 

3612 

781 


309 

5252 

5252 

2 

1587 

401 

252 

155 

2395 

2395 

3 

1608 

435 


119 

2432 

2432 

4 


2451 

1680 

1142 

15765 

15766 

6 

1662 

350 

167 

151 

2330 

2330 

6 

3914 

867 

543 

338 

5662 

5662 

mu 

22875 

5285 

3462 

2214 

33836 


m.j 

22877 

5285 

3462 

2213 


33837 


followed by 

(59) m'ijk = m" i k(m.. k /m.! k ). 

These three successive adjustments would constitute a cycle, which would then 
be repeated in whole or in part until the table becomes rigid with the satis¬ 
faction of all three sets of conditions. 

6. Simplification when only one cell requires adjustment. On occasions it 
happens in sampling work that one is especially interested in one particular cell 
of the universe, and would like to have a result for it in advance before the other 
cells are adjusted. Sometimes it even happens that the others individually 
are of no particular concern. In such circumstances one merely places the cell 








A LEAST SQUARES ADJUSTMENT 


443 


of interest in one comer of the table by an appropriate interchange of rows and 
columns, and then compresses the rest of the table into the cells adjacent to it. 
In the two dimensional Case II one would thus work with a 2 X 2 table, one 
comer cell being the one of special interest, the other three being the result of 
compression. The marginal totals of the row and col umn belonging to the cell 
of interest are unaffected. For illustration we may suppose that from the 
sample shown in Table I we require only m«i . We then start with the 2X2 
Table VI, which is derived from Table I by compression. Commencing with 
Table VI, one might first adjust by rows according to eq. (52), then by columns 
by eq. (53). One cycle of iterative proportions is. sufficient, as is seen in Table 

TABLE VI 


Derived from Table I by compression, the cell i — 6, j = 1, requiring adjustment 



j -1 

J - 2-4 

Wi. 

m. 

i = l — 5 

18965 

9250 

28215 

28175 

i = 6 

3882 

1740 

5622 

5662 

n.j 

22847 

10990 

33837 


m.j; 

22877 

10960 


33837 


TABLE VII 

A proportionate adjustment of Table VI 
Rows adjusted by eq. (52) Columns adjusted by eq. (53) 


18938 

9237 

28175 

18962 

9213 

28175 

3910 

1752 

5662 

3915 

1747 

5662 

22848 

10989 

33837 

22877 

10960 

33837 


Conclusion: = 3915 


VII, and the value 3915 found for m u is in good agreement with its value shown 
in Tables I and V. The scheme of compression provides a quick method of 
getting out an advance adjustment for a cell of special interest, and the result 
so obtained will ordinarily be in good agreement with what comes later when 
and if all the cells are adjusted. 

In the three dimensional Cases II, III, V, VI, and VII, one compresses the 
original table to a 2 X 2 X 2 table, and then uses the method of iterative propor¬ 
tions. (The other cases do not require consideration, since they are propor¬ 
tionate adjustments wherein one is already at liberty to adjust as few or as 
many cells as he likes without altering the equations or the routine.) The same 
procedure can be extended to the adjustment of two cells, the only modification 















444 


W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


being that in two dimensions we shall compress toa2X3ora3X3 table, 
depending on whether the two cells do or do not lie in the same row or column. 
In three dimensions we compress to a 2 X 2 X 3, or a 2 X 3 X 3, or a 3 X 3 X3 
table; the first if the two cells lie in the same i, j, or k tube, the second if they 
lie in the same slice but not in the same tube, the third if they are in separate 
slices. 

7. Some remarks on the accuracy of an adjustment. A least squares adjust¬ 
ment of sampling results must be regarded as a systematic procedure for 
obtaining satisfaction of the conditions imposed, and at the same time effecting 
an improvement of the data in the sense of obtaining results of smaller variance 
than the sample itself, under ideal conditions of sampling from a stable universe. 
It must not be supposed that any or all of the adjusted m,-,- in any table are 
necessarily “closer to the truth” than the corresponding sampling frequencies 
riij , even under ideal conditions. As for the standard errors of the adjusted 
results, they can easily be estimated for the ideal case by making use of the 
calculated chi-square. For predictive purposes, however (which can be regarded 
as the only possible use of a census by any method, sample or complete), it is 
far preferable, in fact necessary, to get some idea of the errors of .sampling by 
actual trial, such as by a comparison of the sampling results with the universe, 
as can often be arranged by means of controls. There is another aspect to the 
problem of error—even a 100 per cent count, even though strictly accurate, is 
not by itself useful for prediction, except so far as we can assert on other grounds 
what secular changes are taking place. 

In conclusion it is a pleasure to record our appreciation of the assistance of 
Miss Irma D. Friedman and Mr. Wilson H. Grabill for putting the formulas 
and procedure into actual operation with census data, and thereby disclosing 
defects in earlier drafts of the manuscript. 

Bureau of the Census, 

Washington 



NOTES 

This section is devoted to brief research and expository articles , notes on methodology 
and other short items. 


THE STANDARD ERRORS OF THE GEOMETRIC AND HARMONIC 
MEANS AND THEIR APPLICATION TO INDEX NUMBERS 1 

By Nilan Norris 

Attempts to derive useful expressions for estimating the standard deviations 
of the sampling errors of the geometric and harmonic means have not yielded 
results comparable with those afforded by the modem theory of estimation, 
including fiducial inference. There are in the literature of probability theory 
certain theorems which can be applied to obtain these desired results in a 
straightforward manner. The use of forms for estimating standard errors is 
subject to certain conditions which are not always fulfilled, particularly in the 
case of time series. An understanding of these limitations should deter those 
who may be tempted to judge the significance of phenomena such as price 
changes solely on the basis of estimated standard errors of indexes. 

1. Statement of formulas. The standard error of the geometric mean of a 
sequence of positive independent chance variables denoted by Xi = Xi , x*, • • • , 

x n , is a G = 0i , where 0 X is the population geometric mean of the variates; 

Vn 

so that (Ti 0f * is the standard deviation of the logarithms in the population as 
given by <n og * = [2?{[log x — 2?(log x)] 2 )]*; and n is the number of individuals 
comprising the sample. The estimate of the standard error of the geometric 

mean is s 0 = G —, where G is the sample geometric mean, that is, the 
vn — 1 

estimate of 0i ; so that $i og Xi is the estimate of <ri og * ; and n — 1 is the degree of 
freedom of the sample. 

1 This article summarizes two papers presented at sessions of the Institute of Mathe¬ 
matical Statistics at Detroit, Michigan on December 27, 1938, and at Philadelphia, Penn¬ 
sylvania on December 27, 1939. The results given herein can be derived by several meth¬ 
ods, which vary somewhat as to degree of rigor. The writer wishes to acknowledge his 
indebtedness to the referee for suggesting a proof based on a probability theorem stated 
by J. L. Doob, “The limiting distributions of certain statistics,” Annals of Math. Stat 
Vol. 4 (1935), pp. 160-169. The standard deviation formulas obtained follow as an applica¬ 
tion of thiB theorem, as will be seen by reference to it. Obviously the asymptotic variance 
formulas of many other statistics (estimates of parameters) can be obtained in a similar 
manner. 


445 



446 


NILAN NORRIS 


The standard error of the harmonic mean of a sequence of positive inde¬ 
pendent chance variables denoted by x< = a?i, Xt, • • • , x n , is = 0\ , 

V» 

where the population harmonic mean of the variates is 0* = 1/a = [^(1/x)]” 1 ; 
so that the standard deviation of 1/x in the population is oty* = [E{[l/x — 
E( 1/*)]*}]*; and n is the number of observations comprising the sample. The 

estimate of the standard error of the harmonic mean is Su = 4 —~~= , where 

a y/n — 1 

the estimate of a is given by a — = - (2 1/x/)] in which * i/*< is the standard 

H u 

deviation of the reciprocals of the observations comprising the sample; and 
n — 1 is the degree of freedom of the sample. 


2. Derivation of formulas. These forms can be obtained by application of 
the Laplace-Liapounoff theorem 2 as follows: Let x t - = X \, X 2 , ♦ • • , x n be a set of 
positive independent chance variables with the same distribution functions, 
where the expectations, E(x % ) and E(x]) exist, and where o\ = E[[x{ — l?(x*)] 2 } 
> 0. The last condition is imposed to eliminate the trivial case in which the £*• 
are all equal and their distribution is confined to a single point. The geometric 
mean of the x< is G = (x x -X 2 .£n) 1/w , and the harmonic mean of the Xi is 



It is necessary to assume that both a] og * and <r\/ x are finite, and that in the 
case of both log x and 1/x at least one moment of order higher than any two of the 
respective variates is also finite. The requirement that the variance and at 
least one moment higher than the variance be finite can be weakened in various 
ways, but this is a trivial consideration, since nearly all distributions of any 
importance have finite third moments. 3 Certain rarely occurring types of 
distributions, such as the Cauchy distribution, have infinite variance. In such 
cases, standard error formulas as ordinarily used are not valid. 

Let i£(log x) = f, and E( 1/x) = a. By the Laplace-Liapounoff theorem, 


except for terms of order l/y/n } the limiting distributions of 


y/ n(log G - f) 


and 


Vn(r‘ - a) 


<riog* 


are normal with zero arithmetic means and unit variances. 


Vl/* 

That is, if C represents a set of conditions on chance variables, and P{C} is the 
probability that these conditions are satisfied, then 


* A. Khintchine, Asymptotische Gesetze der Wahrscheinlichkeitsrechnung, Ergebnute 
der Mathematik und ihrer Grenzgebiete , J. Springer, Berlin, 1933, Vol. II, No. 4, pp. 1-8; 
J. L. Doob, op. cit ., pp. 160-169; and S. 8. Wilks, Statistical Inference, 1936-1937, Edwards 
Brothers, Inc., Ann Arbor, 1937, pp. 39/. 

1 For a more detailed discussion of this matter see Wi?ks, op. cit., pp. 39/. 



GEOMETRIC AND HARMONIC MEANS 


447 


K.p MigotQ-r) c A - lim -«) < 

»-»« { Clot * ) »“*» i film 

In order to use these relations in obtaining the limiting distributions of the 
geometric and harmonic means, it is necessary to suppose that the sequence of 
random chance variables, F,-, converges in probability (converges stochasti¬ 
cally) to p, and that the sequence of random chance variables, y/n(Vi — p), has 
a normal limiting distribution with zero arithmetic mean and variance v . 
Also, it is necessary to assume that the real-valued function, /(x), has a Taylor 
expansion valid in the neighborhood of p. If /'(p) 7 * 0, only the first two terms 
of the series are needed. The required expansion is given by 

m = /(p) + (x - p)/'(p) + - p)i, 

where 0 < /3 < 1, and/"(x) is continuous in the neighborhood of p. When these 
conditions are fulfilled, the limiting distribution of \/n[/(7i) — /(p)] is normal 
with an arithmetic mean of zero and a variance of o- 2 [/'(p)] 2 . 

Let/(log (?) = e l °* °, and use the expansion given by e l °* 0 = e f -f (log G — f)c r 
+ i(log G — f) 2 Since Si = e r , it follows that the limiting distribu¬ 

tion of y/n(G — 0 j) is normal with an arithmetic mean of zero and a variance of 

0i<r lop x • 

Similarly, it can be shown that the limiting distribution of \/n(H — 0 S ) is 
normal with an arithmetic mean of zero and a variance of d\ <rf/», where 0* = 

- = [tfd/aor 1 . 

a 

It is of some interest to observe that the expressions for the standard errors 
of the geometric and harmonic means correspond with the forms previously 
given for the standard errors of two efficient ratio-iqeasures of relative variation, 4 
namely, 

_ e * a 0s 

0 c /a — <7a/o , ana obiq = ^ vow, 

where 0i/0 is the population geometric-arithmetic ratio, and 0*/0i is the popula¬ 
tion harmonic-geometric ratio. 

3. Limitations of standard-error estimates. Application of these forms is 
subject to the usual conditions for drawing sound inferences on the basis of the 
representative method. Fiducial argument should be employed to avoid certain 
untenable assumptions of the outmoded method of using standard errors. 
Estimates of the standard deviations of sampling errors do not constitute an 
ultimate test of significance which can be applied with a high degree of success 
to all types of problems. In general, such estimates cannot be relied upon with a 

4 Nilan Norris, "Some efficient measures of relative dispersion," Annals of ifolh. Stat., 
Vol. 9 (1938), pp. 214-220. 


\/2t 


e ~ 1 dx. 



448 


A. W. BROWN 


high degree of confidence when they are used as tests of significance for index 
numbers, since in nearly all time series there exists an appreciable degree of 
serial correlation, persistence, or lack of independence among successive items of 
any sample. 

4. Bibliographical note. Certain aspects of the sampling distribution of the 
geometric mean have been discussed by Burton H. Camp. 6 Attempts to derive 
forms for estimating the standard errors of index numbers have been made by 
Truman L. Kelley 6 and Irving Fisher, 7 and an empirical study of the sampling 
fluctuations of indexes has been made by E. C. Rhodes. 8 Although various 
special tests of significance for time series have been proposed, 9 at the present 
time no generally satisfactory procedure has appeared. 

Hunter College, 

New York, N. Y. 

1 Burton H. Camp, ‘‘Notes on the distribution of the geometric mean,” Annals of Math . 
Stat., Vol. 9 (1938), pp. 221-226. 

6 Truman L. Kelley, “Certain Properties of Index Numbers,” Quarterly Publications of 
Am, Stat. Assn., Vol. 17, New Series 135, Sept., 1921, pp. 826-841. 

7 Irving Fisher, The Making of Index Numbers, Houghton Mifflin Company, New York, 
1927, 3d ed., pp. 225-229, 342-345, and Appendix I, pp. 407 and 430 /. 

8 E. C. Rhodes, “The precision of index numbers,” Roy. Stat. Soc. Jour., Vol. 99 (1936), 
Part I, pp. 142-146, and Part II, pp. 367-369. 

• Some of the more recent paperB dealing with this matter are: G. Tintner, “On tests of 
significance in time series,” Annals of Math. Stat., Vol. 10 (1939), pp. 139-143; “The analysis 
of economic time series,” Am. Stat. Assn. Jour., Vol. 35 (1940), pp. 93-100; L. R. Hafstad, 
“On the Bartels technique for time-series analysis, and its relation to the analysis of 
variance,” Am. Stat. Assn. Jour., Vol. 35 (1940), pp. 347-361; and Lila F. Knudsen, “Inter¬ 
dependence in a series,” Am. Stat. Assn. Jour., Vol. 35 (1940), pp. 507-514. 


A NOTE ON THE USE OF A PEARSON TYPE HI FUNCTION IN 

RENEWAL THEORY 

By A. W. Brown 

One of the methods suggested by A. J. Lotka 1 for the derivation of the renewal 
function may be briefly summarized as follows. 

The method consists of dissecting the total renewal function into “genera¬ 
tions”. The original installation constitutes the zero generation, the units 
introduced to replace disused units of the zero generation constitute the first 
generation, renewal of these the second, and so on. Let/(:r) be the “mortality” 
function, the same for all generations. f(x) is a function satisfying the usual 
conditions of a distribution function. Adopting Lotka’s notation, let N be the 
number of units in the original collection, B\{t) dt the number of objects intro- 

1 A. J. Lotka, “A Contribution to the Theory of Self Renewing Aggregates, With Special 
Reference to Industrial Replacement,” Annate of Math. Stat., Vol. 10 (1939), p. 1. 



RENEWAL THEORY 


449 


duced between times t and t + dt and belonging to the first generation, Bt(t) dt 
a similar expression for the second generation, etc. Bi(t)/N, Bt(t)/N ,... may 
be regarded as renewal density functions for the various generations. 

Now, evidently, 

(1) By{t) = NM 

(2) £.(fl = j[‘ Bi(t - x)f(x) dx 
and in general 

(3) Bj+i(t) = [' B/(t — x)f{x) dx. 

Summation of the contributions of the successive generations gives for the total 
renewal at the time t 

(4) B(t) = Brit) + jf Bit - x)fix) dx. 


In this note we propose to use a Pearson Type III function for fix) and observe 
what form our equations then assume. The Pearson Type III function 



practical situations. The two parameters c and k give it a considerable amount 
of flexibility. The fact that this function has an unlimited range in one direc¬ 
tion is relatively unimportant from a practical point of view, as is well known 
from the experience of fitting curves of this type to skewed data with limited 
range. Of course the question of whether a Type III curve is appropriate can 
be answered more objectively by using the usual Pearson curve-fitting criteria, 
ft , j8j and k. We have, then, substituting in (1) 


(5) 


Biit) 




and from (2) 

(«) B. (i) - 

(7) 


m 


c x k -\-' x dx 


Nc u 


If, now, we set * = ty, the integral in (7) reduces to 



450 


A. W. BROWN 


Hence, 

( 8 ) 

and in general 
(9) 


Jk 






Summing the contributions of the several generations, we have for the total 
renewal function 


( 10 ) 


B(t) = Nee 


■c Jict) 


k -1 


ict) 


Si-1 


\ r(fc) “ r r(2fc) 



If A is a positive integer > 3, (10) can be easily summed to a form which 
shows immediately its damped periodic nature. Even if k is positive but not 
an integer, it can be shown by continuity considerations that the function B(t) 
defined by (10) has periodic properties. 

Assuming A; to be a positive integer, then, and setting z = ct, we may write 
the expression in brackets in (10) as 


(ID 




_»-i 


+ 


{k - 1)! 1 (2k - 1)! 


+ 


'/(«)• 


Then 


dVW 

dz k 


-/to 


and upon making the trial substitution, f(z) — Ae m ‘, we get 

Am k c m ‘ = Ae m \ 


Hence, 


m 


= 1. 


Taking unity in its complex form 

1 = cos 2 nic + i sin 2 nir 

we have that 


*/- 2 nir . . . 2nir 

= V1 = cos + * sm -j— 


(12) nu 

where n = 0, 1, 2, • • • , k — 1. Then 

Jfc—1 

n-0 


f(z) = i: A n m’„e"*'. 


and 



RENEWAL THEORY 


461 


Now setting 2 = 0, we get 

/(0) * A 0 + Ai + ■ ■ • + A„- 1 = 0 

f(0) = AoitUi + Ainii + • • • + Ak-\tnh-\ — 0 


f *(0) = Attno 1 + Aim\ 1 + • • • + Ak-imtli = 1 

k equations to determine the k constants. We know that A n is equal to the 
ratio of two determinants formed from the coefficients of the above equations. 
This ratio reduces to 

(13) A n = - v V -,-,- 

(m»_i - - m„) • • • (in, - wio) 

We have, then, an expression for the k constants in terms of the & roots of unity. 
Therefore, for any particular value of k we can obtain the sum of our series 
from the relation 

H-0 


Hence, under the assumption that k is a positive integer, we have 
(14) B(t) m Nce~“ £ A»e n * ct . 


The forms of B(t) for k = 1, 2, 3, 4 are respectively 
Bit) = Nc 

B(t) = iJVc(l - e _2c< ) 


B(t) = 




cos iy/dct 


+ 



B(t) = Nce~“ [i(e ct - e~ e ‘) - J sin ct]. 


Although the above procedure is valuable particularly because it brings to 
light something of the nature of our renewal function, the forms derived above 
can be used actually to obtain values of B(t) for various values of t. However, 
for extensive numerical work a better method is at hand, which does not even 
depend on the assumption of an integral value for k. 

Let us return once again to equation (10) which may be written in the fol¬ 
lowing form 


B(t) 


Nc 


«■*'( ct)*-' 

, m 


e-“(cf)“ _1 

r(2Jfc) 



(15) 



452 


A. W. BROWN 


If k and c are determined by the method of moments, (using two moments), 
k will not, in general, be a positive integer. However, by using the Tables of 
the Incomplete Gamma Function edited by Karl Pearson, one can compute values 
of B(t) without much difficulty. In these tables the function I(u, p) is tabulated 
for various values of u and p, where I (u, p) is defined by 


(16) 


I(u, p) = 


ruv^Ti 

/ e~'v p dv 
Jo _ 

r(p +1) 


If we let £ = U\\/p + 1 = MoVp then upon integrating by parts we find 


(17) 


e~ ( i p 
r(p +1) 


= I(u o, p - 1) -/(uj,p). 


The left hand member of this equation is of the same form as each of the terms 
of the scries in brackets in (15). Hence, the value of the renewal function for a 
particular time, t, is directly obtainable by summation of-the right hand member 
of (17) for successive significant values of the argument p. 

By way of illustration a numerical example will be considered. The data are 
taken from E. B. Kurtz’ book entitled Life Expectancy of Physical Property. 
In this book the author makes a study of retirement rates of fifty-two different 
types of physical property, and finds that their replacement curves fall into seven 
distinct groups. We consider here Group VII which happens to be the largest 
group, embracing seventeen different types of industrial equipment out of the 
fifty-two examined. Using Kurtz’ replacement data 2 we obtain for the value 
of the first and second moments 


Mi = 10.002 

yt = 121.71 

and from these by the method of moments, we find 

k = 4.62 
c = .462. 

We then proceed to calculate values of B(t)/N by means of Pearson's Tables,* ob¬ 
taining the results shown in the following table. 

* E. B. Kurtz, Life Expectancy of Physical Property, Ronald Press, 1030, Table 22, page 86. 

* With regard to the method of interpolation employed in the calculations, it should 
be mentioned that it was found advisable to use the Mid-panel Central Difference Formula 
(xxiii) on page xii of the introduction to Pearson’s Tables; and that it is quite sufficient 
for our purposes to calculate only first order terms. 



ESTIMATES OF PARAMETERS 


463 


t 

B(t)/N 

t 

B(t)/N 

0 

.0000 

10 

.1049 

1 

.0016 

11 

.1043 

2 

.0103 

12 

.1028 

3 

.0279 

13 

.1006 

4 

.0486 

14 

.0990 

5 

.0714 

15 

.0994 

6 

.0867 

16 

.1009 

7 

.0980 

17 

.1013 

8 

.1039 

18 

.0992 

9 

.1066 

19 

.0999 



20 

.0993 


In conclusion the author wishes to thank Professor S. S. Wilks for various 
suggestions he has made in connection with this note. 

Princeton University, 

Princeton, N. J. 


ESTIMATES OF PARAMETERS BY MEANS OF LEAST SQUARES 

By Evan Johnson, Jr. 

As a criterion for comparing estimates of a parameter of a universe, of known 
type of distribution, the use of the principle of least squares is suggested. A 
criterion may be stated in rather general terms. Its application to any given 
problem presumes a knowledge of the distribution functions of the estimates 
considered. In the present paper a criterion is set up and application of it is 
made in the estimation of the mean and of the square of standard deviation of a 
normal universe. 

We shall use the symbol 0 to represent a parameter to be estimated. It is 
to be remembered that 0 is a constant throughout any problem, that it represents 
an unknown value, and that observations and functions of observations (called 
estimates) are the only variables that occur. We shall use the symbols z<, t = 
1, 2, • ■ • , », to represent observed values of the variable x of the universe, and 
the symbol F to represent a given function of the observations Xi. 

If we choose to consider a given function F as an estimate of 0, we are then 
interested in the error F — 0. This quantity differs from the so-called residual 
of least square theory, since we are here interested in the difference between 
computed and true values, rather than in the difference between observed and 
computed values. To avoid any possible confusion we shall refer to F — 0 
as the error. Over the set of all samples of n observations, z<, the distribution 
of the errors F — 0 is expressed by means of the distribution function f(F), 








454 


EVAN JOHNSON, JB. 


which may be computed from the known distribution function of the universe. 

We shall assume that the function f(F) has been normalized, so that J f(F) dF = 

1, where the interval from a to 0 includes all possible values of F. The integral 
rfi 

I = J (F - 0 ) 2 f(F) dF, associated with a given estimate F , may be thought 

of as the average square error over the set of all samples. 

In this notation we shall state a criterion for the judgment of estimates in 
either of the two following forms: 

Definition 1. Let fi be the distribution function of F \, and /* that of F %. 
The estimate Fi of 6 will be judged better than the estimate F* if 

[ (x — 0 ) 2 fi(x) dx < [ (x — 0 )%(x) dx . 

"a •'a 


Definition 2. From a given class of functions, of which F is a member, F will 
be called the best estimate if 


( 1 ) 


7 = / V - efm 


dF 


is less than the corresponding integral for all other functions of the class. 

It is to be observed that the integral 7 is a function of the quantities 0 and /. 
From this is seen at once the distinction between the present problem of mini¬ 
mizing the average square error and the similar problem of finding that point 
around which the mean square value of the deviations of a variable is a minimum. 
In the problem under consideration we wish to find the function F, or more 
precisely its distribution function f{F), for which 7 takes its minimum with a 
fixed value of 6 . In the alternative problem we have a given distribution / 
and we wish to find the minimum of 7 with respect to 6 . 

A second observation to be made is that the integral 7 can not be usefully 
minimized in the sense of the general conditions of the calculus of variations. 
The problem would be of the isoperimetric variety, with the side condition 

/ f(x) dx = 1. A solution might be expressed as the limit, as a approaches zero, 

of functions f(x) with proper continuity conditions, such that 


fix) 


f =» 0 when | x — 01 2: a, 

pO+a 

> 0 when | x — 6 \ < a, and / f(x) dx = 1. 


Such a solution would be meaningless in practical statistical theory. Solutions 
are to be expected, therefore, only in those cases where the class of functions, 
from which F is to be selected, is sufficiently restricted. 

The two following examples illustrate both restrictions and possible applica¬ 
tion of the theory. 



ESTIMATES OF PARAMETERS 


455 


As a first example let us consider the problem of finding an estimate F of the 
mean, i, of a normal universe. The mean of a distribution is a symmetric 
linear function of the variates of the distribution. For the class of functions 
from which to select an estimate F of l, let us take the class of all symmetric 
homogeneous linear functions of the observations *,•. Let 

(2) F — a(x i + £*+••• + x n ). 

We wish to find the value of a, if any, for which I is a minimum. 

F is the sum of n normally distributed independent variables, ox<, each with 
standard deviation ao. F, therefore, has a distribution function 

r „ ( — (JF - 

1 Cc,p )’ 

where C is so chosen that f dF = 1. A discussion of general distribution func¬ 
tions may be found in Dunham Jackson's article, “Theory of Small Samples,” 
in the American Mathematical Monthly , Volume XLII, 1935. In this case it 
can be shown without particular difficulty that 

= ana 2 + x 2 (an — l) 2 . 


To determine the minimum of I with respect to a, we set 
^ = 2 ana + 2 x\an - l)n = 0, 


1 1 

ni 2 + a n 1 + ff 2 /nx 2 
+ •••)• 

It is seen that for even such a simple example as the estimation of the mean 
there is no estimate of the form of equation (2), with a independent of the param¬ 
eter to be estimated, for which I takes its minimum value. 

For a distribution in which i ^ 0, and a 1 jux is small, a is given as a first 
approximation by 1/n. The function F is merely the mean of the sample obser¬ 
vations. If Z = 0, the required solution is a = 0, and there is no best least 
square estimate of the type of equation (2). 

In the case where a i /x i is not small, as is apt to be the case when x is near 
zero, the determination of a desirable estimate by least squares requires a knowl¬ 
edge of the ratio a i /x ) which may perhaps be judged approximately in a special 


and obtain 


(3) 



456 


EVAN JOHNSON, JR. 


problem. If this value is assumed known, the required value of a may be found 
most easily by rewriting equation (3) in the form 


(4) 


1 

n + o*/i 2 * 


The second example to be considered is the determination of an estimate of 
& 2 of a normal universe. A comparison with the definition of <r 2 suggests the 
use of a function F given by the equation 

(5) F - a { ( Xl - if + fo - + ... + (x n - xf }, 


where x is the mean of the n observations. The value of a is, of course, to be 
determined by minimizing the integral /. 

F is the sum of the squares of n normally distributed but not independent 
variables. It may be shown, however, (Jackson, loc. cit.) to be expressible as 
the sum of the squares of n—1 independent normally distributed variables, each 
with standard deviation y/a<r. The distribution function for F takes the form 


( 6 ) 


f(F) = C (F) ln - i)li e- FIUrt , 


F taking only positive values, and C is again chosen to normalize/(F). The 
integral I may be written 

(F - dF. 

The integration is most easily accomplished by replacing F by w 2 , and in terms 
of u 

I - C' f (u - <r i )V-V u,/2 "‘ du. 


-n 


The various steps in the integration will differ for even and odd values of n, 
but in each case the final result is the same. It is found that 

(7) I - v 4 { oV - 1) - 2 b(» - 1) + 1 }. 

The value of a which minimizes I is determined from the relation 


- C K {2o(n 2 - 1) - 2(» - 1)} = 0. 


Dividing by (n—1), which is not zero in a sample of two or more observations, 
we obtain 


( 8 ) 


1 

A * 7 1 • 

n + 1 


In contrast to the previous example we have here an absolute minimum of 1 
with respect to all estimates of the type of equation (5). The best least square 
estimate of this type is, therefore, 

(9) F — — — — — — — — 111 ( Xn ~~ j)* 

n + 1 

Pennsylvania State College, 

State College, Pa. 



THE TEACHING OF STATISTICS 1 

By Hakold Hotelling 

The very great increase in the teaching of statistics since the First World 
War has been associated on one hand with the development of statistical theory. 
This important series of discoveries has made available more and more power¬ 
ful and accurate statistical methods, and has also acquired an intellectual 
interest of its own as embodying the modem version of the most important 
part of inductive logic and as providing scope for mathematical and logical 
ingenuity of high order. The increased teaching of statistics has also been 
associated with the rapidly growing applications of statistics in innumerable 
fields, made possible by the development of the theory, by the availability of 
persons having some knowledge of the theory, and by an increasing realization 
of the possibilities of application. Doubtless most students of statistics enter 
upon the subject, not for its intrinsic interest, but with the idea of applying 
statistical methods as a tool to some particular end. This object may be 
scientific research, or to fulfill a requirement for a degree, but is often connected 
with some purely practical pursuit offering the ready prospect of a remunerative 
job. But it would be a mistake to ignore those whose interest is more purely 
intellectual, who desire an insight into the peculiar problems of probable in¬ 
ference and the structure of empirical knowledge, who wish to get a fundamental 
acquaintance with one of the most fundamental of subjects, to see and under¬ 
stand fully the mathematical derivations underlying so much practical and 
scientific activity, and perhaps to make their own cqntributions. 

Of the magnitude of the demand for statisticians there can be no doubt. 
The realization of what statistical methods can do in a multitude of fields has 
gradually led the administrators of government agencies, directors of scientific 
organizations and research institutes, and business men, to employ rapidly 
increasing numbers of persons with some knowledge of statistical methods, and 
to accord an unusual degree of recognition and promotion in many such cases. 
The uses of statistical methods, and especially of sampling theory, are so varied 
that it is scarcely possible in a brief space to give any sort of survey of them. 
They enter, in one form or another, into the research work of the physicist, the 
chemist, the astronomer, the biologist, the psychologist, the anthropologist, 
the medical investigator, the economist, and the sociologist. Meteorology, 
which has lately acquired greatly increased importance, both civil and military, 
is with its masses of numerical observations very much a statistical matter. 
The engineer needs modern statistical methods both in the physical and in the 

1 Address at the meeting of the Institute of Mathematical Statistics at Hanover, N. H., 
September 10, 1940. 


467 



458 


HAROLD HOTELLING 


economic aspects of his plans. The work of W. A. Shewhart has made clear 
the central importance of sampling theory in the economic control of quality 
of manufactured articles. Business men who use sampling surveys to test 
the markets for their products and the effectiveness of their advertising, who 
employ statisticians to make up index numbers and forecasts of business condi¬ 
tions, and whose manufacturing costs and quality are controlled with the 
help of recently devised statistical methods, are finding more and more uses for 
statisticians. Indeed, it seems as if the exploitation of the business and manu¬ 
facturing possibilities of statistical methods has only begun, and that limitless 
further fields are coming into view. Insurance has of course always been essen¬ 
tially dependent on statistics. 

But the most rapidly growing large class of positions for statisticians is at 
present in governmental activities. For some facts regarding the employment 
of statisticians by the federal government I am indebted to Dr. J. M. Thomp¬ 
son. It appears that it has about one hundred agencies using statistics, with 
almost eight hundred positions broadly classified as statistical or mathematical, 
in addition to more than six thousand generally classified as economists. The 
title “economist” covers many types of work, but much of it is largely statis¬ 
tical. The nature of the government’s statistical work Ls varied and extensive. 
It includes such work as forecasting revenue from taxes, prices and production 
of agricultural commodities, general demand conditions, and weather. Some 
of the work consists in analyzing the effects of various taxes on other programs. 
In connection with proposed legislation, statisticians serving the lawmakers 
often attempt to outline the probable results of the legislation, as well as to 
assist in setting up definite formulae for carrying out the general policies aimed 
at in Acts of Congress. Administrators as well as lawmakers require statistical 
activities of a high order, exemplified in the Bureau of the Census, the Bureau 
of Agricultural Economics, and others. The scientific activities of the govern¬ 
ment, the work of the War Department, and many others that do not at first 
sight appear at all statistical, require the services of mathematical statisticians 
of high order. Even the judicial activities call for statistical theory of some 
of the most recently discovered kinds, as for instance in the investigation re¬ 
cently made of parole procedures. Cities and states, school and port authori¬ 
ties, employ numerous statisticians for other and widely diverse purposes. 

The growing need, demand and opportunity have confronted the educational 
system of the country with a series of problems regarding the teaching of statis¬ 
tics. Should statistics be taught in the department of agriculture, anthro¬ 
pology, astronomy, biology, business, economics, education, engineering, 
medicine, physics, political science, psychology, or sociology, or in all these 
departments? Should its teaching be entrusted to the department of mathe¬ 
matics, or to a separate department of statistics, and in either of these cases 
should other departments be prohibited from offering duplicating courses in 
statistics, as they are often inclined to do? To what students, and at what 
stage of their advancement, should a course in statistics be administered? 



THE TEACHING OF STATISTICS 


459 


Should there be mathematical or other prerequisites? How much of an in¬ 
vestment in a statistical laboratory is warranted? Should courses be primarily 
theoretical and mathematical, or should they be made as practical as possible, 
equipping the student in the shortest possible time for a job as statistician, or 
for statistical work in the field with which a particular department is con¬ 
cerned? What about degrees in statistics? Eclipsing all these in importance, 
though it seems to have received too little of the attention of college and uni¬ 
versity administrative officers is the question, What sort of persons should be 
appointed to teach statistics? 

To pressing practical problems answers are sure to be given either by con¬ 
sidered policy or by processes of historical evolution. The latter are the more 
prominent in explaining the statistical teaching we have had. A synoptic 
picture of the origins, not many decades ago, of a good deal of it would perhaps 
be something like this. A university Department of X, where X stands for 
economics, psychology, or any one of numerous other fields, begins to note 
toward the end of the pre-statistical era that some of the outstanding work 
in its field involves statistics. The quantity and importance of such work are 
observed to increase, while at the same time its intelligibility seems to diminish. 
Evidently students turned out with degrees in the field of X who do not know 
something about statistics are going to be handicapped, and are not likely to 
reflect credit on Alma Mater. The department therefore resolves that its 
students must acquire at least an elementary knowledge of the fundamentals 
of statistics. To implement this principle, it perhaps inserts some acquaint¬ 
ance with statistics among the requirements for a degree. This situation 
naturally calls for the introduction of a course in statistics. Accordingly the 
head of the Department of X, in preparing the next Announcement of Courses, 
writes: 

“X 82. Elements of Statistics. An elementary but thorough 
course designed to acquaint students of X with the fundamental con¬ 
cepts of statistics and their applications in the field of X. The view¬ 
point will be practical throughout. Second semester, MWF at 10. 

“Instructor to be announced.” 

The problem now arises of finding someone to teach the new course. The 
few well-known statisticians in the country have positions elsewhere from which 
it would be impossible to dislodge them with the bait to be offered; for though 
the department wishes to have statistics taught as an auxiliary to the study of 
X, it feels that there must be no question of the tail wagging the dog, and that 
economy is appropriate in this connection. The members of the department 
of professorial rank do not respond favorably to the suggestion that they should 
themselves undertake to teach the new and unfamiliar course. But every 
university department has a bright graduate student whose placement is an 
immediate problem. Young Jones has already demonstrated a quantitative turn 
of mind in the course on Money and Banking, or in the Ph.D. thesis on which 



460 


HAROLD HOTELLING 


he has already made substantial progress, dealing with The Proportion of 
Public School Yard Areas Surfaced with Gravel. He may even recall having 
had a high-school course in trigonometry. His personality is all that might 
be desired. He is a white, Protestant, native-born American. And so the 
“Instructor to be announced” materializes as Jones. 

This earnest young scholar now finds that, in addition to completing his 
thesis, he must look up the literature of statistics and prepare a course in the 
subject. His attention is directed by older members of the department to 
some of the research papers in the field of X involving statistics. He pursues 
“statistics” through the library card catalog and the encyclopedias. He reads 
about census and vital statistics, price statistics, statistical mechanics. Per¬ 
haps he encounters probable errors. Eventually he learns that Karl Pearson 
is the great man of statistics, and that Biometrika is the central source of infor¬ 
mation. Unfortunately most of the papers in Biometrika and of Pearson's 
writings, while not lacking in vigor, trail off into mathematical discourse of a 
kind with which young Jones feels ill at ease. What he wants is a textbook, 
couched in simple language and omitting all mathematics, to make the subject 
clear to a beginner. Perhaps he finds the impressive books of Yule and Bowley, 
but decides that they are too abstruse. Elderton’s “Frequency Curves and 
Correlation” is far too mathematical. Jones decides that a simple book on 
statistics must be written, and that he will do it if he can ever succeed in master¬ 
ing the subject. In the meantime, he contents himself perforce with the less 
mathematical writings of Karl Pearson, with applied examples in the field of X, 
and with such nonmathematical textbooks as may have been written by other 
young men who have earlier trod the same path as that on which Jones is now 
beginning. Somehow or other he gets the class through the course. After 
doing this two or three times, Jones is an experienced teacher of statistics, and 
his services are much in demand. His course expands, takes on a settled form, 
and after a while crystallizes into a textbook. At the same time he may be 
getting out some research, consisting of studies in the field of X in which statis¬ 
tical methods play a part. His promotion is rapid. He becomes a Professor 
of Statistics, and perhaps an officer in a national association. His textbook 
has a large sale, and is used as a source by other young men writing textbooks 
on statistics. 

The textbooks written in this way form an interesting literary cycle. Meas¬ 
ures of “central tendency” and of dispersion are introduced, and the use of 
one as against another of these measures is debated on every ground except 
the criterion that modern research has shown to be the important one, the 
sampling stability. Sampling considerations, indeed, get little attention. 
The urge to simplify by leaving out the more difficult parts of the subject, and 
especially the mathematical parts, is accompanied by pride in the great number 
of examples drawn from real life, that is, actual data that have been collected. 

But the most fascinating feature of this literary cycle is the opportunity it 
offers for research by the standard methods of literary investigation, tracing the 



THE TEACHING OF STATISTICS 


461 


influence of one author upon another through parallelism of passages, and so 
forth. This study is facilitated by the accumulation of errors with repeated 
copying. One outstanding example is in certain formulae connected with the 
rank correlation coefficient, derived originally by Karl Pearson in 1907 and 
copied from textbook to textbook without adequate checking back. As one 
error after another was introduced in this process, the formulae presented to 
students (and apparently made the basis of class exercises involving numerical 
substitution) became less and less like Pearson’s original equations. Inci¬ 
dentally, in trying to check this original work of Pearson's, recent investigation 
has raised the suspicion that it is erroneous; at any rate, he does not give a fully 
adequate argument. Thus it may be that the errors in copying, which are so 
useful in examining the history of statistics, never did any harm. The formulae 
in which the students were drilled may have been no worse than they would 
have been if all the copying had been done with more care. 

While this process has been going on in the Department of X, the Y and Z 
Departments have likewise evolved the teaching of statistics. There is some 
interchange of ideas between the various statisticians on the campus, and there 
is a catholicity in the copying of textbooks. But by and large, statistics is 
regarded in the Economics Department as a branch of economics, in the Psy¬ 
chology Department as a part of psychology, and so forth. The astronomer is 
inclined to resent the suggestion that his students should be called upon to study 
their least squares with anyone but an astronomer. Medical and biological 
investigators suspect Economics and Psychology of charlatanry, and do not 
look with favor on the idea of turning their own students over to such depart¬ 
ments for instruction in statistics. Most unthinkable of all would be putting 
the Department of Education in charge of an essential part of the training of 
scientific students. Thus the courses multiply. 

The fact that it is essentially the same fundamental subject that is being 
taught under various names and with various kinds of notation in different 
departments is often concealed by including the teaching of statistical theory 
in a course whose title and prospectus are more suggestive of applications. A 
case in point is that of an economist of my acquaintance, not primarily engaged 
in teaching, who some years ago was invited to give a course in Price Forecasting 
in the Economics Department of a leading university. He carefully prepared a 
series of lectures on this subject, which had been the center of some extended 
research he had conducted. A large class enrolled for the course. But soon 
after beginning his series of lectures the economist noticed that the class was 
growing restive. Upon inquiring what was amiss, he learned that his discourse 
was unintelligible to many of them because he was using technical statistical 
terms and concepts with which they were not familiar. He thereupon under¬ 
took to use simpler language, and when this did not suffice to convey his mean¬ 
ing, to explain the statistical notions involved in his work on price forecasting. 
More and more his lectures came to deal with the elements of statistics, and less 
and less with price forecasting. At the end of the term he felt that he had 



462 


HAROLD HOTELLING 


given the students some elementary knowledge of statistical theory, for which 
they had not enrolled and for which he did not feel particularly well qualified, 
but had taught them virtually nothing about price forecasting. When the 
invitation was repeated the next year, the economist suggested imposing a course 
in statistics as a prerequisite for the course in Price Forecasting. Thishowever 
was vetoed by the head of the Economics Department, who did not believe in 
prerequisites. The Price Forecasting course was not repeated. 

This incident illustrates the evolution of a good deal of statistical teaching. 
At the beginning, the idea is to teach some application, but the teacher soon 
finds himself engaged at much more length than expected with the fundamentals 
of statistical theory and methods. In this way it has come about that a large 
number of persons are teaching theoretical statistics who initially had no inten¬ 
tion of doing so, but were concerned with particular applications. The teach¬ 
ing of statistical theory has been undertaken belatedly and inexpertly because 
it was necessary to a discussion of some application originally in view. Thus 
it happens that a good deal of teaching of statistics, even of mathematical 
statistics, masquerades as something else. 

The obvious inefficiency of overlapping and duplicating courses given inde¬ 
pendently in numerous departments by persons who are not really specialists 
in the subject leads to the suggestion that the whole matter be taken over by the 
Department of Mathematics. This is a promising solution, but it is doomed to 
failure if, as has sometimes happened, it means that the teaching of statistics 
is put under the jurisdiction of those who have no real interest in it. Moreover 
the teaching of statistics cannot be done appreciably better by mathematicians 
ignorant of the subject than by psychologists or agricultural experimenters 
ignorant of the subject. The latter indeed have a certain advantage in that the 
problems seem more real and definite to them; they can sense the difference 
between the important and the unimportant questions, even if they cannot 
express the questions in clear mathematical language, and can sometimes arrive 
intuitively at a correct result that leaves the mathematician puzzled. Also, 
they can understand more readily than can the mathematician the examples, 
drawn largely from biological material, which play so important a part in some 
of the leading expository work on statistics, such as R; A. Fisher’s Statistical 
Methods for Research Workers. The pure mathematician has only one advan¬ 
tage over the non-mathematical worker in empirical fields: he is able to set about 
reading the serious literature of statistical theory. But he must still find this 
scattered literature, sort it out from a mass of rubbish, fallacies, and false starts, 
and trace it back historically until he can understand the notation and the pre¬ 
suppositions. He must also contend with the fact that a good deal that is im¬ 
portant in statistics is still a matter of oral tradition, and some consists of lab¬ 
oratory techniques. In short, he needs a teacher before he himself sets out to 
teach the subject. When a Department of Mathematics calls in a young Ph.D., 
however brilliant, to teach statistics as a part or all of his program, the best 
thing it can do, if he has not already had a training in modern statistics, is to 



THE TEACHING OF STATISTICS 


463 


give him a furlough for a year or two to enable him to go where he can acquire 
such a training. 

Qualifications of a good teacher of statistics include, first and foremost, a 
thorough knowledge of the subject. This statement seems trivial, but it has 
been ignored in such a way as to bring about the present unfortunate situation. 
Mathematicians and others, who deplore the tendency of Schools of Education 
to turn loose on the world teachers who have not specialized in the subjects they 
are to teach, would do well to consider their own tendency to entrust the teach¬ 
ing of statistics to persons who not only have not specialized in the subject, 
but have no sound knowledge of it whatever. A knowledge of theoretical 
statistics is not easy to obtain. There is no comprehensive treatise on the sub¬ 
ject, starting from first principles, and proceeding by sound deductions and 
well-chosen definitions to the methods that need to be used in practice. (I 
have been trying for years to write such a treatise, but it has turned out to be a 
bigger task than at first appeared. This is partly because some things formerly 
thought to have been proved turn out, on critical examination, not to be sound, 
and much new research has been necessary.) The literature is scattered through 
journals pertaining primarily to many kinds of applications, and it is only in 
recent years that any large proportion of the current contributions to statistical 
theory and methods have been gathered into a few periodicals devoted to sta¬ 
tistical theory. On the other hand, the seeker after truth regarding statistical 
theory must make his way through or around an enormous amount of trash 
and downright error. The great accumulation of published writings on statis¬ 
tical theory and methods by authors who have not sufficiently studied the sub¬ 
ject is even more dangerous than the classroom teaching by the same people. 

A good teacher of statistics needs of course a mathematical background, in¬ 
cluding at least an acquaintance with the theory of functions and n-dimensional 
euclidean geometry. A good deal of additional algebra and analysis are likely 
to be helpful, as well as some differential geometry. But no amount of such 
mathematics constitutes by itself any approach to sufficiency in the qualifica¬ 
tions of a teacher of statistics. The most essential thing is that the man shall 
know the theory of statistics itself thoroughly from the ground up, including 
the mathematical derivations of proper methods and a clear knowledge of how 
to apply them in various empirical fields. In addition to the pure mathematics 
and the knowledge of statistical theory, a competent statistician or teacher of 
statistics needs a really intimate acquaintance with the problems of one or more 
empirical subjects in which statistical methods are applied. This is quite im¬ 
portant. Sometimes excellent mathematicians have wasted time and misled 
students through failure to get that feeling for applications that is necessary for 
proper statistical work. 

The theory of statistics has been making advances so rapid and so fundamental 
that some of the first things that need to be said in an elementary course, even 
for prospective practical statisticians, are affected by some of the most recent 
researches. So elementary a question as “What definition is it wise to give to 



464 


HAROLD HOTELLING 


the term ‘standard deviation’?”, which must be faced by every teacher of 
Statistics 1, requires for an intelligent answer a rather thorough understanding 
of modem sampling theory and techniques. The answer, it now seems, is 
not the definition given in most textbooks. In the selection of a statistic to 
represent a parameter, for example in fitting frequency curves or in linkage 
estimation in genetics, the fundamental consideration is connected with the 
sampling distribution, as R. A. Fisher showed in founding the modern theory <iP 
estimation. This is ignored in most of the current teaching of statistics, with 
the result that innumerable students are sent out to waste the money and time 
of their employers by demanding larger samples than are necessary for the pur¬ 
poses in view, wasting costly information by calculating inefficient statistics 
and using tests that are not the most powerful. On the other hand, students of 
statistics who are taught rule-of-thumb methods without their derivations are 
never quite conscious of the exact limitations and assumptions involved, and 
may make unwarranted inferences from samples that are too small or in some 
way violate the conditions underlying the derivations of the formulae. 

A good teacher of statistics must be thoroughly familiar with these recent 
advances. He must examine very critically textbook statements unsupported 
by full proofs. Even though the students are not capable of following the 
complete mathematical argument—indeed, especially if the students are not to 
examine it—the instructor needs to give it a critical study. The custom of 
omitting proofs, which would not be tolerated in pure mathematics beyond 
a very limited extent, is common in the teaching of statistics, and is excused on 
the ground that the students do not know enough mathematics to understand 
the proofs. Perhaps in some cases a better reason is that the teachers, and the 
authors of the textbooks, do not understand the proofs. In some instances 
no proofs exist, and in some instances no genuine proofs can exist, because the 
methods taught are demonstrably wrong. The custom prevalent in the teach¬ 
ing of mathematics of going over each proof carefully in the class is, among other 
things, a safeguard against infiltration of false propositions. This safeguard is 
missing from most of the teaching of statistics, and there has been an infiltration 
of errors. Since it is accepted that a great many students need to learn some¬ 
thing about statistical methods without learning enough mathematics to under¬ 
stand the proofs, it follows that the elementary teaching of statistics to these 
students must, if the perpetuation of gross errors is to be avoided, be in the 
hands of really competent mathematical statisticians. This is perhaps the 
greatest reform needed in the teaching of statistics today. Until the elementary 
teaching of statistics is conducted by those with a thorough and critical knowl¬ 
edge of current research in statistical theory, of a sort that seems virtually 
inseparable from participation in that research, there is likely to be a continua¬ 
tion of the laborious drilling of thousands of students in methods that ought 
never to be used. Here, of all places, is the great need for participation of 
research workers in elementary teaching. 

Teachers and textbook writers might well abandon the idea of telling what 



THE TEACHING 07 STATISTICS 


465 


statistical methods are used, and say instead what methods ought to be used. 
But before they can do this with confidence they must have a very close ac¬ 
quaintance with the research of the last three decades in statistical theory. 

How can an appointing officer know whether a prospective teacher of statistics 
knows his subject? This question requires no answer peculiar to statistics in 
distinction from other subjects. Publication of research, constituting a contri¬ 
bution to the particular field, has always been accepted as the best proof. A 
substantial contribution to fundamental statistical theory, which is to be dis¬ 
tinguished from the mere application of known statistical methods to empirical 
data, is the best indication of the kind of scholarship appropriate to a teacher of 
statistics. 

Participation in research is not novel as a criterion of what constitutes a good 
teacher of a college or university subject, if the subject is Greek literature, 
physics, chemistry, biology, or indeed any of those departments that have been 
long enough established to attain with respect to the organization of their teach¬ 
ing a state approximating equilibrium. The more reputable institutions of 
higher learning have long maintained the principle, though with occasional 
violations in practice, that the Ph.D. degree or its equivalent, representing among 
other things the completion of a piece of scholarly research, is a minimum 
condition for a regular faculty appointment. It has usually been maintained 
also that the Ph.D. thesis should be a new contribution of a strictly scholarly 
character to the field of the scholar’s competence, and not merely a routine 
application of known methods to an extraneous field. Thus a thesis offered for 
the Ph.D. degree in mathematics would be judged by its contribution to mathe¬ 
matics, rather than to physics or accounting. Moreover the regard in which 
universities have held members of their faculties has been intimately connected 
with their output of scholarly research. Other criteria of excellence have not 
been ignored, but research has been recognized in a fairly consistent manner. 
Some say that there has been an over-emphasis on research, and that more at¬ 
tention ought to be given to other qualities related to teaching. However 
this may be, the facts remain that scholarly research is something capable 
of a reasonably objective evaluation by scholars in the field, that it offers the 
main hope of fundamental progress, and that familiarity with current research 
is a necessary, though not sufficient, condition for the most important teaching 
in institutions of higher learning. 

A peculiarity of the teaching of statistics, of which in practice the theory of 
statistics is an essential even if unacknowledged part, is that a good deal of it 
has been conducted by persons engaged in research, not of a kind contributing to 
statistical theory, but consisting of the application of statistical methods and 
theory to something else. A similar situation would exist if the teaching of 
mathematics were in the hands of an assortment of various kinds of engineers, or 
if zoology and botany were taught by practicing physicians. The teaching of 
mathematics and of elementary biology might perhaps gain in liveliness and 
concreteness by such arrangements, with the accompanying emphasis on the 



466 


HAROLD HOTELLING 


particular applications of the fundamental sciences. Moreover the engineer 
might in the course of such teaching refresh his own knowledge of elementary 
mathematics, while the physician might gain by renewing his acquaintance with 
elementary biology. Such arrangements might occasionally be made with 
profit. But if they were the general rule the advantages of specialization would 
be lost; the fundamental sciences would not be developed in so well-rounded a 
manner as they are by specialists in them, while the special skills and knowledge 
of the physician and engineer could not be utilized to the full in their respective 
professions. Statistical theory is a big enough thing in itself to absorb the full¬ 
time attention of a specialist teaching it, without liis going out into applications 
too freely. Some attention to applications is indeed valuable, and perhaps 
even indispensable as a stage in the training of a teacher of statistics and as a 
continuing interest. But particular applications should not dominate the 
teaching of the fundamental science, any more than particular diseases should 
dominate the teaching of anatomy and bacteriology to pre-medical students. 
These subjects are not ordinarily taught by practicing physicians, but by anat¬ 
omists and bacteriologists respectively. 

In medical education the principle has been accepted, after a long struggle, 
that a medical school should have full-time professors engaged primarily in 
teaching and research, and that such professors should not treat patients except 
in cases of unusual interest from the standpoint of the science or art of medicine. 
An analogous principle would be that an institution offering extensive instruc¬ 
tion in statistics should have full-time professors engaged in the teaching of and 
research in statistical theory and methods, without spending time over applied 
statistical problems excepting insofar as such problems might present novel 
features calling for the development of new statistical methods or theoretical 
extensions having interest going beyond the immediate case. Sometimes the 
complaint is heard in medical schools that the teaching tends to become too 
theoretical on account of detachment from clinical practice, and a similar diffi¬ 
culty might conceivably develop in connection with statistics; but in neither 
case does the trouble seem to be beyond the ability of the personnel involved to 
cure if they have the right background. 

A specialist in statistics on a university faculty has a threefold function. In 
addition to the usual duties of teaching and research, there is a need for him to 
advise his colleagues, and other research workers, regarding the statistical 
methods appropriate to their various investigations. The advisory function is 
a highly important one for the activities of the university as a whole, and should 
be taken into consideration in adjusting the teaching load. Probably every 
university statistician is visited from time to time by earnest research workers, 
deeply engrossed in their respective specialities, speaking technical jargons un¬ 
familiar to the statistician, and seeking his advice on matters concerning which 
he has a sinking feeling of lack of comprehension. After some hours of psycho¬ 
analyzing his visitor the statistician may be able to ascertain what it is he really 
wants to know, and thereafter either refer him to some standard formula, or 



THE TEACHING 07 8TATI8TIC6 


467 


more often, undertake a piece of new mathematical research designed to fit 
the particular problem, and very possibly having value also for a more extended 
class of problems. The statistician is then very likely to find himself embarked 
on a co-operative research venture in a field that is new to him. 

To function well in this third, the consultative or co-operative function, he 
must have an unusually large store of general information. No one stands in 
greater need than he of that knowledge of “something about everything and 
everything about something” that was once said to be the goal of a liberal 
education. In planning the education of statisticians and teachers of statistics 
these considerations point to a somewhat wider diffusion of studies among vari¬ 
ous fields than is customary in many institutions, especially in graduate work. 
The co-operation, and their other work, would also be facilitated if research 
workers in general were more strongly urged to get a training in mathematical 
statistics at an early stage in their careers. 

The problem of departmental organization is secondary to that of getting men 
having the requisite qualities of extensive mathematical preparation, a thorough 
knowledge of modern theoretical statistics, an understanding of some fields at 
least in which statistical methods can be applied, and the type of inquiring 
mind sometimes described as a “research outlook.” A Department of Mathe¬ 
matics may well handle the fundamental teaching in statistics, provided it has 
men properly qualified for such teaching. If it does not have such men, its 
teaching of statistics and its inability to provide the needed statistical advice 
will inevitably tempt the other departments to set up again their own duplicat¬ 
ing courses in what amounts essentially to statistical theory and methods, and 
to repeat the mistakes of the past. 

A separate Department of Statistics, if competently staffed, could very well 
provide advice for the whole institution as well as conducting elementary in¬ 
struction in statistical methods and theory, both for students having calculus 
and for those without it, and should certainly carry on advanced teaching and 
research in statistical theory and methods. But for efficient functioning of the 
institution as a whole it should be agreed that the Department of Statistics or 
the Department of Mathematics should do all the elementary instruction in 
statistics, and that courses in statistics in other departments should be confined 
to applications of the basic theory. Normally such courses in applied statistics 
in the other departments should require as a prerequisite one or more of the basic 
courses in the Department of Statistics, or of Mathematics. The basic course 
to be required as a prerequisite to others should be the one which itself requires 
calculus as a prerequisite wherever this is practicable. It is practicable for 
students of engineering, physics, astronomy, and mathematical economics, since 
these students must have calculus anyhow. Moreover the value of the se¬ 
quence consisting of calculus, statistical theory and applied statistics, in this 
order, is so great that many other students are likely to avail themselves of it 
when it is once established and the true nature and value of statistics are more 
widely understood. 



468 


HAROLD HOTELLING 


Exactly how far a Department of Statistics should go in particular applica¬ 
tions would have to be decided anew from time to time by its members in the 
light of changing conditions and interests. It cannot teach everything that goes 
by the name of statistics. This problem may be exemplified by the case of 
population and vital statistics. This is a field with close connections with so¬ 
ciology, biology, medicine and insurance. It is cultivated in conjunction with 
each of these subjects in various places. Some of its most interesting and im¬ 
portant phases make use of quite advanced mathematics, as in the work of 
A. J. Lotka, and in addition there is extensive use, and more extensive need, of 
the statistical methods centered around sampling theory which are the appro¬ 
priate domain of a Department of Statistics. Should the study of population 
and vital statistics be included in a Department of Statistics? I think not, 
except as a temporary arrangement, or in a small institution, in spite of the 
history of the word “statistics,” which originated in connection with material 
of this kind, and in one of its meanings is still applied to it. (My use of the 
unqualified word “statistics” in this paper is in the sense of theory and methods, 
not in the sense of statistical facts such as those found by the census.) Medical, 
biological and sociological considerations are prominent in the problems of vital 
statistics, and one of these departments might well handle the subject. But 
the vital statistician, like other research workers, should have acquired in the 
course of his training an intimate familiarity with the statistical theory and 
methods which are the appropriate province of a Department of Statistics. 
He also needs mathematics through integral equations, if he is to understand and 
extend the contributions of Lotka and Volterra. Students of vital statistics 
should have had an elementary course in statistical theory in the Department of 
Statistics, preferably the course requiring calculus. 

A course in price statistics should be taught by an economist, presumably in 
the Department of Economics, but might well require as a prerequisite the same 
elementary courses in statistical theory and methods as would be required in 
psychology, medicine and other fields. In addition, there are problems of time 
series analysis whose treatment calls for a mathematical statistician having some 
acquaintance with both economic and meteorological data. A course on the 
treatment of time series might appropriately be included in the Department of 
Statistics, requiring the general elementary course as a prerequisite, and itself 
serving as a prerequisite for courses in economic and meteorological statistics. 

One of the chief obstacles to efficient organization of teaching is the habit of 
not prescribing prerequisites outside one’s own department. But when once 
the elementary courses in statistics have become established in the hands of well- 
equipped specialists in statistical theory and methods, in whose competence 
general confidence can be reposed, the various departments of application will 
lose their motive for establishing their own duplicating courses, and will be able 
to cultivate more intensively their respective specialities. 

The detection of biases and the details of practical statistical work vary greatly 



THE TEACHING OF STATISTICS 


469 


from one application to another. These, consequently, are matters lor the de¬ 
partments concerned with applications rather than with the fundamentals of 
statistics, and should not be the chief features of a course in elementary statis¬ 
tical methods and theory. The work of a Department of Statistics should be 
concerned largely with sampling theory, and should emphasize the unity of 
statistical methods and theory, regardless of the field of application. It should 
deal with statistics as a coherent science of inductive inference, of the prepara¬ 
tion of observations for inference, and of the planning of investigations so as to 
yield observations from which inferences can best be made. 

The question what mathematical prerequisites should be established for the 
fundamental course in statistical theory must be answered by a compromise 
between the ideal and what is expedient at a particular time and place. In 
Europe a large number of students have had a year of calculus before coming to 
universities, that is, before reaching the age of eighteen. If a university were 
willing to restrict its entrants to such students (thus automatically solving the 
problem of overcrowding) it could give them another year of calculus, mixed 
perhaps with advanced algebra and geometry, and then in their sophomore year 
give them a thorough course in elementary statistics and probability, based on 
calculus. These students would then be ready to tackle advanced statistics in 
the third year in a really effective way. If the teaching of economic theory, 
physics, chemistry and astronomy were geared to this program in such a way as 
to make real use of the calculus, the work in these subjects could be made far 
more efficient, in the sense that more material could be covered effectively in 
the allotted time, or an equivalent amount of material in less time. If, in addi¬ 
tion, all the many departments in which statistical methods and theory are used 
required these statistical courses as prerequisites, and actually used the mate¬ 
rials of these courses in their work, there would be a further huge gain in effi¬ 
ciency. The baccalaureate degree of such an institution would represent a far 
more thorough knowledge, and command of the tools of research, than is possible 
without an arrangement putting in this way the fundamentals first. 

Institutions unwilling to undertake such a drastic improvement must face 
more or less delay and inadequacy in the acquisition by their students of the 
fundamentals of mathematics and of statistics. A division of the students into 
groups according to mathematical ability ought to be undertaken, and followed 
by a corresponding division of the elementary statistics course. Students having 
high mathematical ability could begin the study of statistics after completing 
calculus, and could look forward to rising ultimately to greater heights in pur¬ 
suits involving mathematical or statistical knowledge than those of lesser mathe¬ 
matical talents. For these latter there would still be the possibility of acquir¬ 
ing, even without calculus, useful statistical tools; but it is essential that this 
should be done under the guidance of instructors thoroughly familiar with the 
mathematics of statistics. The task of leading the blind must not be turned 
over to the blind. Students possessing the ability to master the calculus should 



470 


HAROLD HOTELLING 


be encouraged to begin the study of statistics with the course having calculus 
as a prerequisite, and should not be put into the necessarily slower group not 
having the calculus. I believe that these elementary courses should begin with 
the theory of probability, but should go on to the chief distribution functions 
used in practice, and should include applied problems and work on calculating 
machines. 

Putting a sound program of statistical teaching into effect will take time, 
partly because of the scarcity of suitable teachers of statistics. Nevertheless, 
the process is well under way, and the prospects are good for substantial im¬ 
provements in the teaching of statistics. A body of able young research men 
possessing the requisite knowledge of statistical fundamentals is now in existence 
and is growing. Some of the recent textbooks represent striking improvements. 
The Institute of Mathematical Statistics itself, with the Annals of Mathematical 
Statistics, is perhaps the best evidence of a changed view making for better 
things. 

Columbia University, 

New York, N. Y. 


DISCUSSION OF PROFESSOR HOTELLING’S PAPER 
By W. Edwards Deming 

It is a pleasure to endorse Professor Hotelling’s recommendations; in fact we 
have been following them pretty closely in the courses in the Graduate School 
of the Department of Agriculture. As a matter of fact, he has indirectly played 
an influential part in building up this set of courses, because some of our best 
instructors are his former students. 

Listening to Professor Hotelling’s paper, I was thinking of the possibility 
that some of his recommendations might be misunderstood. I take it that they 
are not supposed to embody all that there is in the teaching of statistics, because 
there are many other neglected phases that ought to be stressed. In the Bureau 
of the Census the population division alone has augmented its force by ap¬ 
proximately 3500 statistical clerks during the past six months. They come from 
diverse schools and it has been interesting to observe how many of them have the 
idea that all the problems of sampling and inference from data can be solved by 
what are commonly known as modem statistical techniques—correlation co¬ 
efficients, rank correlation coefficients, chi-square, analysis of variance, con¬ 
fidence limits, and the like. Most of them are shocked to learn that many of 
the so-called modem “theories of estimation” are not theories of estimation at 
all, but are rather theories of distribution and are a disappointment to one who is 
faced with the necessity of making a prediction from his data, i.e., of basing 



THE TEACHING 07 STATISTICS 


471 


some critical course of action on them. The conviction that such devices as 
confidence limits and Student’s t provide a basis for action regardless of the 
size of the sample whence they were computed, even under conditions of statis¬ 
tical control, is too common a fallacy. On the other hand, many simple but 
worthy devices are neglected. A histogram, for instance, can be a genuine 
tool of prediction if it is built up layer by layer in different legends so as to dis¬ 
tinguish the different sources whence the data are derived. The modem student, 
and too often his teacher, overlook the fact that such a simple thing as a scatter 
diagram is a more important tool of prediction than the correlation coefficient, 
especially if the points are labeled so as to distinguish the different sources of the 
data. Most students do not realize that for purposes of prediction the con¬ 
sistency or lack of it between many small samples may be much more valuable 
than any probability calculations that can be made from them or from the entire 
lot. Students are not usually admonished against grouping data from heterog¬ 
eneous sources. Of those that are not guilty of indiscriminate grouping, many 
are inclined to rely on statistical tests for distinguishing heterogeneity, rather 
than on a careful consideration of the sources of the data. Too little attention 
is given to the need for statistical control, or to put it more pertinently, since 
statistical control (randomness) is so rarely found, too little attention is given 
to the interpretation of data that arise from conditions not in statistical control. 

Nevertheless, the fundamentals of probability and sampling theory, and the 
mathematics of the distribution functions, though by themselves they do not 
qualify anyone for high-grade statistical work, are ultimately essential for pro¬ 
ficiency in statistics. Since they are seldom learned away from the university 
they are properly made the main theme of teaching. The university is the 
place to learn the studies that are so difficult to get outside of it. 

Above all, a statistician must be a scientist. The skepticism of many first 
class scientists of today for modem statistical methods should be a challenge to 
statistical teaching. A scientist does not neglect any pertinent information, 
yet students of statistics are often taught to do just the opposite of this, and are 
accused of being old-fashioned for daring to think of combining experience with 
the new information provided by a sample, even if it is a pitifully small one. 
Statisticians must be trained to do more than to feed numbers into the mill and 
grind out probabilities; they must look carefully at the data, and take account 
of the conditions under which each observation arises. It is my feeling that 
the chief duty of a statistician is to help design experiments in such a way 
that they provide the maximum knowledge for purposes of prediction; another 
is to compile data with the same object in view; and still a third function is 
to help bring about some changes in the source of the data. Scientific data 
are not taken merely for inventory purposes. There is no use taking data if 
you don’t intend to do something about the sources whence they arise. 

Bureau of the Census, 

Washington 



472 


HAROLD HOTELLING 


RESOLUTIONS ON THE TEACHING OF STATISTICS 

The Institute of Mathematical Statistics at its business meeting on September 
11,1940 at Dartmouth College adopted the following resolutions regarding the 
teaching of statistics. The resolutions were drawn up by a committee appointed 
by the President, and consisting of Burton H. Camp, W. Edwards Deming, 
Harold Hotelling, and Jerzy Neyman. 

1. If the teaching of statistical theory and methods is to be satisfactory, it 
should be in the hands of persons who have made comprehensive studies of the 
mathematical theory of statistics, and who have been in active contact with 
applications in one or more fields. 

2. The judgment of the adequacy of a teacher’s knowledge of statistical 
theory must rest initially on his published contributions to statistical theory, in 
contrast with mere applications, in a manner analogous to that long accepted in 
other university subjects. 

3. These ideas are expressed in detail in the paper The teaching of statistics, 
by Professor Harold Hotelling, and the Institute decides to give both the 
resolution and the paper as wide a circulation as possible. 



REPORT OF THE HANOVER MEETING OF THE INSTITUTE 

The sixth meeting of the Institute of Mathematical Statistics was held at 
Dartmouth College, Hanover, New Hampshire, Tuesday to Thursday, Sep¬ 
tember 10 to 12, 1940, in conjunction with meetings of the American Mathe¬ 
matical Society and of the Mathematical Association of America. The fol¬ 
lowing forty-two members of the Institute attended the meeting: 

H. E. Arnold, Felix Bernstein, G. W. Brown, J. H. Bushey, B. H. Gamp, A. T. Craig, 
A. R. Crathorne, J. H. Curtiss, J. F. Daly, W. E. Deming, J. L. Doob, Churchill Eisenhart, 
M. L. Elveback, C. H. Fischer, M. M. Flood, R. M. Foster, T. C. Fry, H. P. Geiringer, 
Robert Henderson, E. H. C. Hildebrandt, G. M. Hopper, Harold Hotelling, E. V. Hunting- 
ton, M. H. Ingraham, Dunham Jackson, W. L. Kichline, L. F. Knudsen, B. A. Lengyel, 
W. G. Madow, J. W. Mauchly, Richard von Mises, E. B. Mode, Jerzy Neyman, P. S. Olm- 
stead, Oystein Ore, M. M. Sandomire, L. W. Shaw, F. F. Stephan, A. G. Swanson, Abra¬ 
ham Wald, S. S. Wilks, Jacob Wolfowitz. 

The meeting of the Institute consisted of four sessions. At the first session, 
which was held on Tuesday morning, Professor Harold Hotelling of Columbia 
University delivered an address on The Teaching of Statistics . This address 
was followed by considerable discussion on the various aspects of the teaching 
of statistics. 1 Preceding Professor Hotelling’s address a short paper on an 
Empirical Comparison of the “Smooth” test for goodness of fit with Pearson’s 
Chi-Square test was presented by Professor J. Neyman of the University of 
California. 

Following Professor Hotelling’s address a business meeting of the Institute 
was held. At this time resolutions on the teaching of statistics were approved 
(see p. 472). The President reported that a War Preparedness Committee 
had been appointed in the summer to study the matter of the Institute’s par¬ 
ticipation in the national defense program. 2 The Chairman of this Committee 
submitted a preliminary report which met the approval of the Institute. A 
plan was approved for completing the report and circularizing it with a minimum 
of delay. 

The matter of the organization of locfl sections or chapters of the Institute 
was discussed but no action was taken. 

1 Professor Hotelling’s address and three resolutions regarding the teaching of Statis¬ 
tics which were adopted by the Institute at a business meeting following the address are 
published in the present issue of the Annals of Mathematical Statistics , pp. 457-472. 

1 The membership of the Committee is as follows: 

Professor Churchill Eisenhart (Chairman), University of Wisconsin. 

Professor A. T. Craig, University of Iowa. 

Professor E. G. Olds, Carnegie Institute of Technology. 

Captain Leslie E. Simon, Aberdeen Proving Ground. 

Mr. Ralph E. Wareham, General Electric Company. 

473 



474 


REPORT OF HANOVER MEETING 


On Tuesday afternoon a session on contributed papers in Mathematical 
Statistics was held jointly with the American Mathematical Society. Pro¬ 
fessor B. H. Camp of Wesleyan University presided and the following papers 
were presented: 

1. Contributions to the theory of the representative method of sampling . 

Dr. W. G. Madow, Department of Agriculture, Washington. 

2. A generalization of the law of large numbers. 

Dr, Hilda P. Geiringer, Bryn Mawr College. 

3. On the problem of two samples from normal populations with unequal variances . 

Professor S. S. Wilks, Princeton University. 

4. Experimental determination of the maximum of an empirical function. 

Professor Harold Hotelling, Columbia University. 

5. Asymptotically shortest confidence intervals. 

Dr. Abraham Wald, Columbia University. 

6. Reduction of certain composite statistical hypotheses. 

Dr. G. W. Brown, R. H. Macy and Company, Inc., New York. 

7. Conception of equivalence in the limit of tests and its application to certain X and x 1 

tests. 

Professor J. Neyman, University of California. 

Abstracts of these papers follow this report. 

On Wednesday morning a session was held on The Theory of Probability 
with Dr. T. C. Fry of the Bell Telephone Laboratories, in the chair. The 
following addresses were given: 

1. On the foundations of probability theory. 

Professor R. von Mises, Harvard University. 

2. Probability as measure. 

Professor J. L. Doob, University of Illinois. 

This session was followed by an energetic discussion which was continued in an 
informal afternoon session. 

The Thursday morning session was devoted to the Theory of Statistical Esti¬ 
mation with Professor Harold Hotelling as Chairman. The following addresses 
were given: 

1. Estimation by intervals as a classical problem in probability. 

Professor J. Neyman, The University of California. 

2. Statistical estimation in large samples. Dr. Joseph F. Daly, The Catholic Univer¬ 
sity of America. 

On Monday at 4:15 p.m. a tea was held at the Graduate Club for members 
of the mathematical organizations and their guests, and on Monday at 8:00 a 
musical performance was presented. On Tuesday at 7:00 p.m. a joint dinner 
was held for the mathematical organizations in Thayer Hall. Wednesday 
afternoon was devoted to an excursion to Franconia Notch. 

During the meeting a collection of string models of ruled surfaces was ex¬ 
hibited by Professor Robin Robinson of Dartmouth College and electrical 
calculation apparatus made from telephone equipment was exhibited by mem¬ 
bers of the staff of the Bell Telephone Laboratories. 



ABSTRACTS OF PAPERS 

(Presented on September 10, 1940, at the Hanover meeting of the Institute) 

Contributions to the Theory of the Representative Method of Sampling. 

William G. Madow, Washington, D. C. 

The theory of representative sampling may be regarded as a dual sampling process; the 
first of which consists in the sampling of different random variables and the second of which 
consists in repeating several times the experiments associated with each of the different 
random variables. It follows that while the theory of sampling from finite populations 
without replacement may be required for the first process, the second leads directly into 
the theory of sampling from infinite populations. There is, however, one difference. 
Although the usual theory is concerned with the evaluation of fiducial or confidence limits 
for parameters the theory of sampling is concerned with the evaluation of fiducial or confi¬ 
dence limits for, say, the mean of a sample of N , when n, (N > n), of the values are known. 

It is thus possible to use the usual theories of estimation in obtaining estimates of the 
parameters and to allow the effects of subsampling process to show themselves in the 
different values of the fiducial limits. It is shown that the limits obtained are almost 
identical with those obtained by the theory of sampling from a finite population. Distri¬ 
butions of the statistics used in these limits are derived. 

Besides these results, the theory is extended to the theory of sampling vectors, and condi¬ 
tions are stated under which the “best” allocation of the number in a sample among several 
strata is proportional to the fcth roots of the generalized variance of a random vector 
having k components. 

A Generalization of the Law of Large Numbers. Hilda Geiringer, Bryn 
Mawr. 

Let Fi(x), F a (x), • • * , F n (a;) be n probability distributions which are not supposed to 
be independent and let Fix i , x 3 , • • • , x n ) be a “statistical function” of n observations 
in the sense of v. Mises,—F,(x) {i « 1, 2, ••• n) indicating as usual the probability of 
getting a result 2a x at the ith observation—. Then it can be proved that under fairly 
general conditions F(x i , x %, • • * , x n ) converges stochastically toward its “theoretical 
value' 1 ; or in other words, that under these general conditions a great class of statistics 
F(x i , Xi , • • • , x n ) is “ consistent ” in the sense of R. A. Fisher. 

Well known particular caseB of this theorem result if (a) we take for F{x\ , x 3 , • • • , x n ) 
the average (xi -f x% 4- • * * + x n )/n of the n observations, (b) we assume that the V{(x) 
are independent distributions. 

On the Problem of Two Samples from Normal Populations with Unequal Vari¬ 
ances. S. S. Wilks, Princeton University. 

Suppose Om and are samples of n\ and n* elements from normal populations n and 
ir s respectively. Let a \, <rj and a», <r\ be the means and variances of *i and and let 
Om and On 8 have means $i and 2* and variances sj and (unbiased estimates of <r \, o\) 
respectively. It is shown that there exists no function (Borel measurable) of & 1 , , 

* 1 1 *5 > oi — a 2 independent of *1 and 0 %, having its probability law independent of the 
four population parameters. It is therefore impossible to obtain exact confidence limits 

475 



476 


ABSTRACTS OP PAPERS 


for oi — a% corresponding to a given confidence coefficient. Functions of the four parame¬ 
ters and four statistics are devised from which one can set up confidence limits for a t — at 
with associated confidence coefficient inequalities. 

Experimental Determination of the Maximum of an Empirical Function. 

Harold Hotelling, Columbia University. 

In physical and economic experimentation to determine the maximum of an unknown 
function, for example of a monopolist’s profit as a function of price, or of the magnetic 
permeability of an alloy as a function of its composition, the characteristic procedure is to 
perform experiments with chosen values of the argument x , each of which then yields an 
observation, subject to error, on the corresponding functional value y — f(x). The values 
of x need, however, to be chosen on the basis of earlier experiments in order to make the 
determination efficient. The experimentation properly proceeds, therefore, in successive 
stages, with the values used at each stage determined with the help of the earlier work. 
The question what distribution of x as a function of previous results should be used is 
discussed in this paper on the basis of various hypotheses regarding the function, and 
further criteria. In particular, a conflict is shown to exist under some conditions between 
the criterion of minimum sampling variance and that calling for absence of bias. 

Asymptotically Shortest Confidence Intervals. Abraham Wald, Columbia 
University. 

Let f(x, 0) be the probability density function of a variate x involving an unknown 
parameter 0. Denote by x \, • • •, x n n independent observations on x and let C n (0) be a 

1 d ~ 

positive function of 0 such that the probability that log f(x a , 0 ) < C n (B) 

Vn 30 £=i 

is equal to a constant 0 under the assumption that 6 is the true value of the parameter. 

1 d ^ 

Denote by B'(x i , • • • , x n ) the root in 0 of the equation -7= ~ > log }{x ai B) ■* C n (B) 

Vn 30 " 

1 d „ 

and by B"(x\ , • • • , x n ) the root of —7=. > log/(x OJ B) « —C„(0). Under some weak 

Vn 30 « 

assumptions on f(x , 0) the interval 6 n (xi , • • • , x n ) - [0'(x 1 , • • • , x n ), 0 "(x 1, • • • , x n )] 
is in the limit with n —► ® a shortest unbiased confidence interval 1 of 0 corresponding to 
the confidence coefficient 0. This confidence interval is identical with that given by S. S. 
Wilks in his paper “Shortest average confidence intervals from large samples/ 1 The Annals 
of Mathematical Statistics , Sept. 1938 . Wilks has shown that 6 n (xi, • • •, x n ) is asymptot¬ 
ically shortest in the average compared with all confidence intervals computed on the 
basis of statistics belonging to a certain class C. In the present paper it has been proved 
that the confidence interval in question is asymptotically shortest compared with any 
arbitrary unbiased confidence interval, without any restriction to a certain class of 
functions. 

Reduction of Certain Composite Statistical Hypotheses. George W. Brown, 
R. H. Macy and Co., New York. 

The results obtained make it possible to reduce a large class of composite statistical 
hypotheses to equivalent simple hypotheses. The fundamental theorem established states 
essentially that if two distributions give rise, in sampling, to the same distribution of the 

1 For the definition of a shortest unbiased confidence interval see the paper by J. Ney- 
man, “Outline of a theory of statistical estimation based on the classical theory of proba¬ 
bility,“ Phil. Trans . Roy. Soc. ( 1937 ). 



ABSTRACTS OP PAPERS 


477 


set of differences between observations, then one distribution must be a translation of the 
other, subject to a condition requiring that the characteristic function of one of the distri¬ 
butions be such that any interior intervals of zeros be not too large. The result is estab¬ 
lished by means of the functional equation — *») ■■ MOiKWiK —U — $t) 

relating the characteristic functions. Similar results are obtained for scale, and com¬ 
bination of location and scale, and the corresponding situations in multivariate distribu¬ 
tions. This type of uniqueness theorem permits one to reduce a composite hypothesis 
involving an unknown location parameter (or scale, or both) to an equivalent simple 
hypothesis. 

Conception of Equivalence in the Limit of Tests and Its Application to Certain 
X- and x 2 -Tests. J. Neyman, University of California. 

Denote by E a system of observable variables and by N the number of independent 
observations of those variables to be used for testing a certain statistical hypothesis H 
against a set 0 of admissible simple hypotheses h . Let further Ti(N) and T%(N) be two 
different tests of H using the same number N of observations. Consider the probability 
Pjtih) calculated on any admissible simple hypothesis h, of the two tests, contradicting 
themselves. 

Definition: If, whatever be h e 0, the probability Pif{h) tends to zero as N is indefinitely 
increased, then the two tests are said to be equivalent in the limit. 

Consider a number s of series of independent trials and denote by Eg , Eg , • • • , Eim{ 
all the m < possible and mutually exclusive outcomes of each of the trials forming the ith 
series. Let p,*, be the probability of Eij , n< the total number of trials in the ith series, 
and ng the number of these which give the outcome Eij . 

Suppose that it is desired to test a composite hypothesis H concerning all the proba¬ 
bilities p^ and consisting of the assumption that any one of them is a given linear function 
of some t independent parameters 0* , so that 


( 1 ) 


p»7 * Clijo 4" dij 101 + ••• + CLijt 


where the coefficients agk are known. The main result of the paper is then that the X-test 
of the above hypothesis //, tested against the set ft of alternatives ascribing to the pg 
any non-negative values, is equivalent in the limit to th^ test consisting of rejecting H 
when the minimum of the expression 


( 2 ) 


9 mi 
i-1 /-I 


(ng - mpu)* 
ng 


calculated with respect to unrestricted variation of the 0’s, exceeds the tabled value of xl 

corresponding to the chosen level of significance c and to the number of degrees of freedom 

* 

<-i 

It will be noticed that the expression (2) differs from the usual x* in the denominator 
of each term. 

As an example of the application of the test based on (2), consider the case where M 
varieties of sugar beet are tested for resistance to a certain disease in an experiment 
arranged in N randomized blocks. Denote by n the number of beets selected at random 
for inspection from each plot and by ng the number of those of the ith variety from the 
plot in the jth block which are found to be infected. Denote further by pg the proportion 
of infected beets of the tth variety in the plot in the jth block. The hypothesis that the 
effects of variety and of block are additive is expressed by pg •* p 4- Vi + Bj with 
XVi « ZBj — 0. To test this hypothesis we may use (2) which in this particular case 
reduces itself to 



478 


ABSTRACTS OF PAPERS 


M N 

( 3 ) x* - £ £ - P - Vi - *,)« 

t-1 J-l 

with wn » n*/(wiy(n — n,/)j, - n<y/n. The minimum xo of x* is found by solving a 

set of equations which are linear in p, Vi , Bj and the comparison of xo with the tabled 
value corresponding to (M — 1 )(N — 1) degrees of freedom will tell us whether we are 
likely to be very wrong in assuming additivity or not. In the favorable case we may 
next proceed similarly to test another hypothesis that there is no differentiation between 
the varieties, so that Vi *■ Vt - ••• « Vm m 0. 

Empirical Comparison of the “Smooth” Test for Goodness of Fit with the 
Pearson’s x 2 Test. J. Neyman, University of California. 

In a previous publication* the author has deduced a test for goodness of fit, described 
as the “smooth test” or the ^* test, applicable to cases where the hypothesis tested H 
is simple. The test is so devised as to be particularly sensitive to departures from H 
which are “smooth” in the sense explained in detail in the publication quoted. Whether 
the test so devised does present any advantage over the usual x* test depends on how 
frequently we meet, in practice, cases where the hypotheses alternative to the one tested 
are actually smooth. 

The present investigation was undertaken with the object of obtaining some information 
on this point. For that purpose a number of cases described in the literature where there 
was a question of testing that some observable variable x follows some perfectly specified 
distribution p(x) were analyzed. Of all such cases, the ones where there were a priori 
theoretical reasons to believe that p{x) could not possibly represent the true distribution 
of x and, at the most, it could be considered as only an approximation to the true distri¬ 
bution were selected. 

It was assumed that the departures from the hypothetical distributions are typical of 
those that may be met in practice when no definite information as to the actual state of 
affairs is available. The hypothesis of goodness of fit was tested both by means of the 
X* and by the fourth order smooth test. Out of the 130 cases studied the two tests were 
in perfect agreement eight timeB. Out of the remaining 122 cases the smooth test proved 
to be more sensitive than the x* in 70 cases and the x a better than the smooth test in 52 
cases. We may further compare the tests by counting those cases where one of them 
detected the falsehood of the hypothesis tested at a given level of significance while the 
other failed to do so. At the level of significance .05 the x* test rejected the hypothesis 
tested 13 timeB, while P+* was >.05. The reverse was true in 17 cases. At the level of 
significance .01 the corresponding figures are 5 and 14, again in favor of the smooth test. 

* J. Neyman, “ ‘Smooth Test' for Goodness of Fit.” Skandinavisk Aktuarietidskrift , 
1937, pp. 149-199. 



REPORT OF THE WAR PREPAREDNESS COMMITTEE OF THE 
INSTITUTE OF MATHEMATICAL STATISTICS 


The generally recognized functions of a statistician are the calculation of 
averages, percentages, and index numbers; the construction of bar graphs and 
pie diagrams; and the compilation of data in general. His other activities 
are less widely known. In particular, the recent advances in mathematical sta¬ 
tistics are known to a relatively small proportion of the persons occupying 
responsible positions in academic life, in industry, and in government. The 
mathematical statistician , in fact, is concerned chiefly with the interpretation 
of data through the use of probability theory; his is the science of reasoning 
from a part to the whole, and of prediction; and to him falls the task of stating 
the conditions under which such inferences are possible, of devising means of 
testing whether these conditions are satisfied, and of evaluating the prob¬ 
ability that such ‘uncertain inferences’ are correct in specific instances. Fur¬ 
thermore, it is his responsibility to so plan the lay-out of experiments and the 
conduct of surveys that the data they yield will contain the maximum informa¬ 
tion on the points at issue and be amenable to unambiguous statistical 
interpretation. 

Because of the functions which the rnathematical statistician can perform his 
services should be of value to the National Defense Program in the following 
fields: 

I. Quality Control and Specification. The functions of a mathematical 
statistical nature connected with quality control and specification of articles 
produced by mass production are: 

(1) Tests of randomness. These are important because statistical methods 
of inference are strictly valid only for random samples. 

(2) The use of probability theory in predicting the outcome of future repetitions 
of an operation which is in a state of statistical control. 1 The evaluation of the 
probability that the quality of a piece of product will lie within any previously 
specified tolerance limits as long as a state of statistical control is maintained, 
and the development of sampling inspection techniques are examples of this 
function. 

1 A repetitive operation, such as a production process, is said to be in a state of statistical 
control when it produces a sequence of observations which exhibit the property random¬ 
ness. An important aspect of quality control is the improvement of quality which comes 
as the result of an effort to reduce a manufacturing process to a state of statistical control. 
Furthermore, when this state of control is attained it is possible to gain a reduction in 
cost of inspection, a reduction in cost of rejections, a reduction in tolerance limits where 
quality measurement is indirect, and the attainment of uniform quality even though the 
inspection test is destructive. 

479 



480 


WAR PREPAREDNESS COMMITTEE 


(3) Representative sampling . When a repetitive operation such as a produc¬ 
tion process is not in a state of statistical control, it is not possible to make 
valid inferences about the quality of a lot from an examination of a sample 
from the lot unless the sampling process is one of random selection within 
“strata” in accordance with the principles of representative sampling. 

(4) Analysis of variance. Reference is made here to the technique whereby 
the total variability of a product of an operation which is in a state of statis¬ 
tical control can be decomposed into components associated with the various 
sub-operations involved. 

(5) Correlation methods . When a direct measurement of quality is extremely 
costly, it is sometimes advisable to use as an indirect measurement of quality 
the value of some character less costly to measure which is highly correlated 
with quality. 

(6) Specification of quality as a variable. Statistical theory, including tests 
for randomness, must be taken into account in writing quality specifications if 
the consumer is to be protected against the vagaries of sampling and the pro¬ 
ducer safeguarded from the incurring of penalties of an unjust chance. 

II. Sampling Surveys. The importance of conducting sampling surveys 
in accordance with the principles of representative sampling is well established. 
It is quite possible that such surveys and partial censuses will be needed in 
connection with the National Defense Program in order to determine the 
frequency and location of individuals possessing special traits, e.g. persons 
capable of withstanding the rigours of dive bombing, or persons possessing 
types of color blindness which render them valuable as observers who can 
detect camouflage, etc. The “problem of sizes” connected with Stores and 
Supplies—see below—may require careful preliminary surveys. Also, surveys 
may be needed to evaluate the effects of various types of propaganda. 

III. Experimentation of Various Kinds. The mathematical statistician 
can be of service in connection with experimentation of various kinds under¬ 
taken as a part of the National Defense Program since the following aspects 
of experimentation are of a mathematical statistical nature: 

(1) Randomization. Since statistical tests for the existence of differences 
between samples, of correlation, etc. are strictly valid only for random samples, 
the operation of randomization is of paramount importance in “the comparison 
of new designs, new materials or alloys, study of contact phenomena under 
different conditions, corrosion of materials under different atmospheric con¬ 
ditions, and field trial of equipment, to mention only a few.” If randomization 
is not undertaken, observed differences between designs, for instance, may have 
arisen from non-random assignable differences in the material presented. Fur¬ 
thermore, the validity of tests for significant differences between the effects 
of various designs rests upon the condition that the variability observed in 
the effects of each design be of random character and free from trends and 
non-random shifts in magnitude—i.e. the operation of determining the effects 



WAB PREPAREDNESS COMMITTEE 


481 


of each design must be in a state of statistical control, to use a phrase employed 
in quality control. 

(2) Experimental design. Without careful attention to the lay-out of an 
experiment, the data it yields may be difficult and even impossible to interpret. 
Therefore, the principles of experimental design set forth by R. A. Fisher and 
his followers are of great importance, as are also the special experimental ar¬ 
rangements which have been devised to cope with many of the more usual 
difficulties met in practice. 

IV. Personnel Selection. The allocation of individuals to places where 
they can be of greatest value in the National Defense Program will undoubt¬ 
edly require tests for mental and physical traits. Although the development 
and analysis of such tests is largely in the hands of psychometric groups, the 
use of methods of multivariate statistical analysis in such work renders this 
field one in which mathematical statistics ought to play an important role. 


It is in the above four fields that there is special need for the training and 
endowments of the mathematical statistician. He can also render valuable 
assistance in the following fields: 

V. Stores and Supplies. 

(1) Problem of sizes. Preliminary surveys are likely to prove useful in 
ascertaining the relative frequencies of demand for the respective sizes of cloth¬ 
ing, etc. in different parts of the country. 

(2) Development of procedures for charting the day to day location and move¬ 
ment of stores and supplies. 

(3) Problem of replacement of parts and equipment. In many it is more eco¬ 
nomical to make replacement at statistically determined times, than to wait 
for complete failure. 

VI. Transportation and Communication. Probability theory has shown 
its usefulness in peace time in handling “traffic” problems that arise in telephone 
and telegraph communication, electric power distribution, etc. No doubt it 
will find corresponding application to problems in these fields arising out of the 
National Defense Program. 

VII. Gunnery and Bombing. Although there is a need in connection with 
artillery fire for further development of methods of estimating standard devia¬ 
tions from successive differences in order to minimize the biases arising from 
slowly changing conditions during the period of firing, the principles of artillery 
fire are quite firmly established and the relatively new science of bombing is 
likely to present greater opportunities for the application of the methods of 
mathematical statistics. For instance, in evaluating bombing techniques 
there is need of statistical methods in separating the constant biases from the 
random variability. 



482 


WAR PREPAREDNESS COMMITTEE 


VIII. Meteorology. The extent to which statistical methods are being 
employed in meteorology can be seen from an examination of the Monthly 
Weather Review Supplement No. 39, issued April 1940, and entitled “Reports 
on Critical Studies of Methods of Long-Range Weather Forecasting.” There 
seems to be excellent opportunity here for the application of methods of multi¬ 
variate analysis and for the development and uses of methods applicable to 
serially correlated data. Such work would be of value in National Defense 
so far as it would enable the forecasting of conditions suitable for launching an 
attack. 

IX. Medicine. The National Defense Program will probably require the pre¬ 
paration and storage of hormone substances, toxic compounds, drugs, and other 
medicinal supplies. Since many such are examined for potency, toxicity, etc. 
by means of animal assays, there will be considerable opportunity here for 
the sound application of mathematical statistics in planning and interpreting 
these bioassays. 

In nearly all of the above activities the application of mathematical statistics 
is likely to encounter two major difficulties: 

(1) Obtaining an adequate trial of the methods of mathematical statistics. 

(2) Supplying persons to occupy key positions in the application of mathe¬ 
matical statistics in a given field—persons competent in mathematical statis¬ 
tics and who possess a sound background in the field of application. 

In some of the above activities, e.g. Quality Control, there will be the further 
difficulty of 

(3) Supplying the vast number of slightly trained workers who will gather 
the data and perform the analyses. 

It is with these difficulties in mind that the Committee recommends that the 
Institute 

(1) Prepare a register of Institute members, stating for each member his 
background, interests, and experience so far as these relate to mathematical 
statistics and its applications ; 2 

(2) Appoint a committee to handle inquiries concerning personnel qualified 
to deal with particular projects; 

(3) Cooperate to the fullest extent in matters pertaining to quality control 
and specification with the Joint Committee for the Development of Statistical 
Applications in Engineering and Manufacturing , of which the Institute , is a 
sponsor. 3 

* The preparation of this register should be coordinated with any similar undertaking 
sponsored by the National Roster of Scientific and Specialized Personnel , National Re¬ 
sources Planning Board, Executive Office of the President, Washington, D. C. 

* We suggest the following as possible undertakings in a cooperative program with the 
Joint Committee: 

(1) Requesting statements regarding the potential contribution to National Defense 



WAR PREPAREDNESS COMMITTEE 


483 


(4) Undertake such steps as are feasible which will lead to cooperation with 
other organizations having interests similar to those of the Institute, e.g. the 
American Statistical Association, the Psychometric Society, and the Econo¬ 
metric Society. 

(5) Establish contact with the National Defense Research Committee headed 
by Dr. Vannemar Bush and coordinate the Institute’s activities with those 
of this national Committee. 


In conclusion, we feel that as an organized group the Institute’s primary 
function in relation to the National Defense Program should be to serve as a 
reservoir of specialists, experienced in the use of the methods of mathematical 
statistics, who can direct the use of these methods and be of assistance in the 
development of new techniques as needed. As a secondary, but equally im¬ 
portant function, the Institute is in a position to supervise, and perhaps to 
undertake through the activities of its individual members, the training in 
mathematical statistics of the individuals who will be needed in the application 
of whatever statistical programs of the type noted above are undertaken in 
connection with the National Defense Program. It is recommended , therefore , 
that the Institute’s interest in the above activities , and its willingness to be called 
upon , be adequately publicized , possibly by sending copies of this report to various 
members of the Government, such as the Chief Signal Officer and the Coordinar 


of statistical methods in quality control and specification from men prominent in industry 
who are familiar with recent developments in quality control. Such individuals would 
be asked to give, where possible, concrete evidence of the value of such methods in their 
experience—evidence which would be helpful in securing authoritative acceptance of 
statistical methods in quality control and specification. 

(2) The organization of a syllabus on statistical methods for use in evening courses 
at various industrial centers. (Captain Simon of our Committee is preparing “An En¬ 
gineer’s Manual of Statistical Methods” which will be issued shortly.) 

(3) The preparation of a list of topics for inclusion in university courses. 

(4) The preparation of a list of suggested reading on statistical methods in quality 
control and specification, arranged under such headings as “expository,” “methodology,” 
etc. 

(5) The arrangement of local meetings and round table discussions at some of the uni¬ 
versities in a few large industrial centers. Some well known leader of the locality might 
serve as chairman. To such a meeting would be invited those men in local industries who 
were interested in the possibility of applying statistical methods to their problems, and 
the meeting could be thrown open to discussion after a brief paper outlining the accom¬ 
plishments of statistical methods of quality control in the speaker’s experience and stating 
the advantages to be gained by employing such methods in the mass production of the 
War Preparedness Program. 

(6) Sponsor the preparation of popular expository articles on quality control for in¬ 
dustrial journals, Readers Digest, Scientific American, etc., and other activities designed 
to popularize the subject and gain authoritative acceptance of statistical methods of 
quality control. 



484 WAR PREPAREDNESS COMMITTEE 

tor of National Defense Purchases and also to the secretaries of appropriate 
organizations, such as the American Standards Association, with the request 
that they advise the Institute of any specific action they feel the Institute 
should take. 

A. T. Craig 

E. G. Olds 

L. E. Simon 

R. E. Wareham 

C. Eisenhart, Chairman. 





l.A.R.1. 73 


INDIAN AGRICULTURAL RESEARCH 
INSTITUTE LIBRARY, NEW DELHI. 


Data of Iaaua T b Dote of I«*ue Date of Iarae 



GIPNLK—H-40 I.A.R.I.—29-4-5-15, 








